silver,matthew final thesis
TRANSCRIPT
Putting the lsquoTechrsquo in Techno Detecting
Genres and Trendsetters in Electronic
Music By Dirichlet Processes
Matthew Scott Silver
Submitted in partial fulfillment
of the requirements for the degree of
Bachelor of Science in Engineering
Department of Operations Research and Financial Engineering
Princeton University
Adviser Ramon Van Handel
June 2016
I hereby declare that I am the sole author of this thesis
I authorize Princeton University to lend this thesis to other institutions or
individuals for the purpose of scholarly research
Matthew Silver
I further authorize Princeton University to reproduce this thesis by photocopying or
by other means in total or in part at the request of other institutions or individuals
for the purpose of scholarly research
Matthew Silver
Abstract
This thesis provides a foundation of code and models to mathematically analyze
the evolution of Electronic Music (EM) over time Using chronologically ordered
data from the Million Song Dataset it utilizes a Dirichlet Process Gaussian Mixture
Model to assign the songs to clusters based on pitch and timbre data and without
any previous assumptions of the clusters beforehand By examining the characteristic
sounds of songs in each cluster the following conclusions are reached
1 Which artists and songs were most innovative for their time
2 Potential new ways in which the genealogy of and relations between EM genres
can be imagined
Finally this thesis evaluates the strengths and weaknesses of the model used and
suggests future work that can be done to improve upon it
iii
Acknowledgements
I would like to thank Professor Ramon Van Handel for advising me on my thesis
You helped me figure out how to narrow down my goals into a concrete topic and
provided useful input on how to model and frame my problem effectively I would
also like to thank the Princeton ORFE department for providing funding to download
and manage the dataset I used for this project Michael Bino and the Computational
Science and Engineering Support group (CSES) were incredibly useful for helping me
set up and run my programs on the Princeton servers Without your help I would
have had a much harder time getting my 300GB dataset of music to play nice I
would also like to thank Jeffrey Scott Dwoskin for providing the Latex template from
which I wrote this thesis And finally I would like to thank my family and friends
especially Lucas and Kathryn for providing continuous support and feedback The
work we all poured into our theses is incredible and wersquove made it through this
sometimes rocky journey in the greatest university of all
On a personal note regardless of whether you are a current Princeton under-
graduate or are just interested in my work push yourself beyond your comfort zone
and donrsquot let grades or other peoplersquos opinions get in your way Take classes and
join new groups that reflect your passions At the same time love yourself Take
care of your body and have some fun without feeling guilty And finally form great
relationships While Princetonians sometimes appear hypercompetitive and forced
they are genuinely sweet and brilliant people who you will treasure for life These
four years at Princeton have gone by in a flash and in the whirlwind of highs and
lows Irsquove gone through these are the most important lessons Irsquove learned
iv
To my parents
v
Contents
Abstract iii
Acknowledgements iv
List of Tables viii
List of Figures ix
1 Introduction 1
11 Background Information 1
12 Literature Review 3
13 The Dataset 10
2 Mathematical Modeling 12
21 Determining Novelty of Songs 12
22 Feature Selection 14
23 Collecting Data and Preprocessing Selected Features 20
231 Collecting the Data 20
232 Pitch Preprocessing 21
233 Timbre Preprocessing 25
3 Results 27
31 Methodology 27
32 Findings 29
321 α=005 29
vi
322 α=01 33
323 α=02 38
33 Analysis 46
4 Conclusion 53
41 Design Flaws in Experiment 53
42 Future Work 55
43 Closing Remarks 56
A Code 57
A1 Pulling Data from the Million Song Dataset 57
A2 Calculating Most Likely Chords and Timbre Categories 58
A3 Code to Compute Timbre Categories 60
A4 Helper Methods for Calculations 61
Bibliography 68
vii
List of Tables
31 Song cluster descriptions for α = 005 33
32 Song cluster descriptions for α = 01 38
33 Song cluster descriptions for α = 02 45
viii
List of Figures
11 A userrsquos taste profile generated by Spotify 4
12 Data processing pipeline for Mauchrsquos study illustrated with a segment
of Queenrsquos Bohemian Rhapsody 1975 8
21 scikit-learn example of GMM vs DPGMM and tuning of α 15
22 Number of Electronic Music Songs in Million Song Dataset from Each
Year 26
31 Song year distributions for α = 005 31
32 Timbre and pitch distributions for α = 005 32
33 Song year distributions for α = 01 35
34 Timbre and pitch distributions for α = 01 37
35 Song year distributions for α = 02 41
36 Timbre and pitch distributions for α = 02 44
ix
Chapter 1
Introduction
11 Background Information
Electronic Music (EM) is an increasingly popular genre of music with an immense
presence and influence on modern culture Because the genre is new as a whole and
is arguably more loosely structured than other genres - technology has enabled the
creation of a wide range of sounds and easy blending of existing and new sounds alike
- formal analysis especially mathematical analysis on the genre is fairly limited and
has only begun growing in the past few years As a fan of EM I am interested in
exploring how the genre has evolved over time More specifically my goal with this
project was to design some structure or model that could help me identify which EM
artists have contributed the most stylistically to the genre Oftentimes famous EM
artists do not create novel-sounding music but rather popularize an existing style
and the motivation of this study is to understand who has stylistically contributed
the most to the EM scene versus those who have merely popularized aspects of it
As the study progressed the manner in which I constructed my model lent to
a second goal of the thesis imagining new ways in which we can imagine EM genres
1
While there exists an extensive amount of research analyzing music trends from
a non-mathematical (cultural societal artistic) perspective the analysis of EM
from a mathematical perspective and especially with respect to any computationally
measurable trends in the genre is close to nonexistent EM has been analyzed to a
lesser extent than other common genres of music in the academic world most likely
due to existing for a shorter amount of time and being less rooted in prominant
social and cultural events In fact the first published reference work on EM did not
exist until 2012 when Professor Mark J Butler from Northwestern University edited
and published Electronica Dance and Club Music a collection of essays exploring
EM genres and culture [1] Furthermore there are very few comprehensive visual
guides that allow a user to relate every genre to each other and easily observe how
different genres converge and diverge While conducting research the best guide I
found was not a scholarly source but an online guide created by an EM enthusiast
Ishkurrsquos Guide to Electronic Music [2] This guide which includes over 100 specific
genres grouped by more general genres and represents chronological evolutions by
connecting each genre in a flowchart is the most exhaustive analysis of the EM scene
I could find However the guidersquos analysis is very qualitative While each subgenre
contains an explanation on typical rhythm and sounds and includes well-known
songs indicative of the style the guide is created by someone who used historical and
personal knowledge of EM My model which creates music genres by chronologically
ordering songs and then assigning them to clusters is a different approach towards
imagining the entire landscape of EM The results may confirm Ishkurrsquos Guidersquos
findings in which case his guide is given additional merit with mathematical evi-
dence or it may be different suggesting that there may be better ways to group EM
genres One advantage that guides such as Ishkurrsquos and historically-based scholarly
works have over my approach is that those models are history-sensitive and therefore
may group songs in a way that historically makes sense On the other hand my
2
model is history-agnostic and may not realize the historical context of songs when
clustering However I believe that there is still significant merit to my research
Instead of classifying genres of music by early genres that led to them my approach
gives the most credit to the artists and songs that were the most innovative for their
time and perhaps reveal different musical styles that are more similar to each other
than history would otherwise imply This way of thinking of music genres while
unconventional is another way of imagining EM
The practice of quantitatively analyzing music has exploded in the last decade
thanks to technological and algorithmic advances that allow data scientists to con-
structively sift through troves of music and listener information In the literature
review I will focus on two particular organizations that have contributed greatly to
the large-scale mathematical analysis of music Pandora a website that plays songs
similar to a songartistalbum inputted by a user and Echo Nest a music analytics
firm that was acquired by Spotify in 2014 and drives Spotifyrsquos Discover Weekly
feature [3] After evaluating the relevance of these sources to my thesis work I will
then look over the relevant academic research and evaluate what this research can
contribute
12 Literature Review
The analysis of quantitative music generally falls into two categories research con-
ducted by academics and academic organizations for scholarly purposes and research
conducted by companies and primarily targeted for consumers First looking at the
consumer-based research Spotify and Pandora are two of the most prominent based
groups and the two I decided to focus on Spotify is a music streaming service where
users can listen to albums and songs from a wide variety of artists or listen to weekly
3
playlists generated based on the music the user and userrsquos friends have listened to
The weekly playlist called Discover Weekly Playlist is a relatively new feature in
Spotify and is driven by music analysis algorithms created from Echo Nest Using
the Echo Nest code interface Spotify creates a ldquotaste profilerdquo for each user which
assesses attributes such as how often a user branches out to new styles of music how
closely the userrsquos music streamed follows popular Billboard music charts and so on
Spotify also looks at the artists and songs the user streamed and creates clusters
of different genres that the user likes (see figure 11) The taste profile and music
clusters can then be used to generate playlists geared to a specific user The genres
in the cluster come from a list of nearly 800 names which are derived by scraping
the Internet for trending terms in music as well as training various algorithms on a
regular basic by ldquolisteningrdquo to new songs [4][5]
Figure 11 A userrsquos taste profile generated by Spotify
4
Although Spotify and Echo Nestrsquos algorithms are very useful for mapping the land-
scape of established and emerging genres of music the methodology is limited to
pre-defined genres of music This may serve as a good starting point to compare my
final results to but my study aims to be as context-free as possible by attaching no
preconceived notions of music styles or genres instead looking at features that could
be measured in every song
While Spotifyrsquos approach to mapping music is very high-tech and based on ex-
isting genres Pandora takes a very low-tech and context-free approach to music
clustering Pandora created the Music Genome Project a multi-year undertaking
where skilled music theorists listened to a large number of songs and analyzed up to
450 characteristics in each song [6] Pandorarsquos approach is appealing to the aim of
my study since it does not take any preconceived notions of what a genre of music
is instead comparing songs on common characteristics such as pitch rhythm and
instrument patterns Unfortunately I do not have a cadre of skilled music theorists
at my disposal nor do I have 10 years to perform such calculations like the dedicated
workers at Pandora (tips the indestructible fedora) Additionally Pandorarsquos Music
Genome Project is intellectual property so at best I can only rely on the abstract
concepts of the Music Genome Project to drive my study
In the academic realm there are no existing studies analyzing quantifiable changes in
EM specifically but there exist a few studies that perform such analysis on popular
Western music in general One such study is Measuring the Evolution of Contem-
porary Western Popular Music which analyzes music from 1955-2010 spanning all
common genres Using the Million Song Dataset a free public database of songs
each containing metadata (see section 13) the study focuses on the attributes pitch
timbre and loudness Pitch is defined as the standard musical notes or frequency of
5
the sound waves Timbre is formally defined as the Mel frequency cepstral coefficients
(MFCC) of a transformed sound signal More informally it refers to the sound color
texture or tone quality and is associated with instrument types recording resources
and production techniques In other words two sounds that have the same pitch
but different tones (for example a bell and voice) are differentiated by their timbres
There are 12 MFCCs that define the timbre of a given sound Finally loudness
refers to intrinsically how loud the music sounds not loudness that a listener can
manipulate while listening to the music Loudness is the first MFCC of the timbre
of a sound [7] The study concluded that over time music has been becoming louder
and less diverse
The restriction of pitch sequences (with metrics showing less variety inpitch progressions) the homogenization of the timbral palette (with fre-quent timbres becoming more frequent) and growing average loudnesslevels (threatening a dynamic richness that has been conserved until to-day) This suggests that our perception of the new would be essentiallyrooted on identifying simpler pitch sequences fashionable timbral mix-tures and louder volumes Hence an old tune with slightly simpler chordprogressions new instrument sonorities that were in agreement with cur-rent tendencies and recorded with modern techniques that allowed forincreased loudness levels could be easily perceived as novel fashionableand groundbreaking
This study serves as a good starting point for mathematically analyzing music in
a few ways First it utilizes the Million Song Dataset which addresses the issue
of legally obtaining music metadata As mentioned in section 13 the only legal
way to obtain playable music for this study would have been to purchase all songs I
would include which is infeasible While the Million Song Dataset does not contain
the audio files in playable format it does contain audio features and metadata that
allow for in-depth analysis In addition working with the dataset takes out the
work of extracting features from raw audio files saving an extensive amount of time
and energy Second the study establishes specifics for what constitutes a trend
in music Pitch timbre and loudness are core features of music and examining the6
distributions of each among songs over time reveals a lot of information about how
the music industry and consumersrsquo tastes have evolved While these are not all of the
features contained in a song they serve as a good starting point Third the study
defines mathematical ways to capture music attributes and measure their change
over time For example pitches are transposed into the same tonal context with
binary discretized pitch descriptions based on a threshold so that each song can be
represented with vectors of pitches that are normalized and compared to other songs
While this study lays some solid groundwork for capturing and analyzing nu-
meric qualities of music it falls short of addressing my goals in a couple of ways
First it does not perform any analysis with respect to music genre While the
analysis performed in this paper could easily be applied to a list of songs in a specific
genre certain genres might have unique sounds and rhythms relative to other genres
that would be worth studying in greater detail Second the study only measures
general trends in music over time The models used to describe changes are simple
regressions that donrsquot look at more nuanced changes For example what styles of
music developed over certain periods of time How rapid were those changes Which
styles of music developed from which other styles
A more promising study led by music researcher Matthias Mauch [8] analyzes
contemporary popular Western Music from the 1960s to 2010s by comparing numer-
ical data on the pitches and timbre of a corpus of 17000 songs that appeared on the
Billboard Hot 100 Like the previously mentioned paper Measuring the Evolution
of Contemporary Western Popular Music Mauchrsquos study also creates abstractions
of pitch and timbre in order to provide a consistent and meaningful semantic inter-
pretation of musical data (see figure 12) However Mauchrsquos study takes this idea a
step further by using genre tags from Lastfm a music website and constructing a
7
hierarchy of music genres using hierarchical clustering Additionally the study takes
a crack at determining whether a particular band the Beatles was musically ground-
breaking for its time or merely playing off sounds that other bands had already used
Figure 12 Data processing pipeline for Mauchrsquos study illustrated with a segment ofQueenrsquos Bohemian Rhapsody 1975
While both Measuring the Evolution of Contemporary Western Popular Music
and Mauchrsquos study created abstractions of pitch and timbre Mauchrsquos study is more
appealing with respect to my goal because its end results align more closely with
mine Additionally the data processing pipeline offers several layers of abstraction
8
and depending on my progress I would be able to achieve at least one of the levels of
abstraction As shown in figure 12 each segment of a raw audio file is first broken
down into its 12 timbre MFCCs and pitch components Next the study constructs
ldquolexiconsrdquo or a dictionary of pitch and timbre terms that all songs can be compared
to For pitch the original data is in a N-by-12 matrix where N is the number of time
segments in the song and 12 the number of each of the notes found in an octave of
pitches Each time segment contains the relative strengths of each of the 12 pitches
However music sounds are not merely a collection of pitches but more precisely
chords Furthermore the similarity of two songs is not determined by the absolute
pitches of their chords but rather the progression of chords in the song all relative to
each other For example if all the notes in a song are transposed by one step the song
will sound different in terms of absolute pitch but the song will still be recognized
as the original because all of the relative movements from each chord to the next
are the same This phenomenon is captured in the pitch data by finding the most
likely chord played at each time segment then counting the change to the next chord
at each time step and generating a table of chord change frequencies for each song
Constructing the timbre lexcion is more complicated since there is no easy analogue
like chords for pitches to compare songs Mauchrsquos study utilizes a Gaussian Mixture
Model (GMM) by iterating over k=1 to k=N clusters where N is a large number
running the GMM on each prior assumption of k clusters and computing the Bayes
Information Criterion (BIC) for each model The lowest of the N BIC values is found
and that value of k is selected That model contains k different timbre clusters
and each cluster contains the mean timbre value for each of the 12 timbre components
For my research I decided that the pitch and timbre lexicons would be the most
realistic level of abstraction I could obtain Mauchrsquos study adds an addtional layer
to pitch and timbre by identifying the most common patterns of chord changes and
9
most common timbre rhythms and creating more general tags from these combined
terms such as ldquo stepwise changes indicating modal harmonyrdquo for a pitch topic and
ldquooh rounded mellowrdquo for a timbral topic There were two problems with using this
final layer of abstraction for my study First attaching semantic interpretations to
the pitch and timbral lexicons is a difficult task For timbre I would need to listen
to sound samples containing all of the different timbral categories I identified and
attaching user interpretations to them For the chords not only would I have to
perform the same analysis as on timbre but take careful attention to identify which
chords correspond to common sound progressions in popular music a task that I am
not qualified for an did not have the resources for this thesis to seek out Second
this final layer of abstraction was not necessary for the end goal of my paper In
fact consolidating my pitch and timbre lexicons into simpler phrases would run the
risk of pigeonholing my analysis and preventing me from discovering more nuanced
patterns in my final results Therefore I decided to focus on pitch and timbral
lexicon construction as the furthest levels of abstraction when processing songs for
my thesis Mathematical details on how I constructed the lexical and timbral lexicons
can be found in the Mathematical Modeling section of this paper
13 The Dataset
In order to successfully execute my thesis I need access to an extensive database of
music Until recently acquiring a substantial corpus of music data was a difficult and
costly task It is illegal to download music audio files from video and music-sharing
sites such as YouTube Spotify and Pandora Some platforms such as iTunes offer
90-second previews of songs but using only segments of songs and usually segments
that showcase the chorus of the song are not reliable measures to capture the entire
essence of a song Even if I were to legally download entire audio files for free I would
10
run into additional issues Obtaining a high-quality corpora of song data would be
challenging writing scripts that crawl music sharing platforms may not capture all of
the music I am looking for And once I have the audio files I would have to perform
audio processing techniques to extract the relevant information from the songs
Fortunately there is an easy solution to the music data acquisition problem
The Million Song Dataset (MSD) is a collection of metadata for one million music
tracks dating up to 2011 Various organizations such as The Echo Nest Musicbrainz
7digital and Lastfm have contributed different pieces of metadata Each song is
represented as a Hierarchical Data Format file (HDF5) which can be loaded as a
JSON object The fields encompass topical features such as the song title artist
and release date as well as lower-level features such as the loudness starting beat
time pitches and timbre of several segments of the song [9] While the MSD is
the largest free and open source music metadata dataset I could find there is no
guarantee that it adequately covers the entire spectrum of EM artists and songs
This quality limitation is important to consider throughout the study A quick look
through the songs including the subset of data I worked with for this report showed
that there were several well-known artists and songs in the EM scene Therefore
while the MSD may not contain all desired songs for this project it contains an
adequate number of relevant songs to produce some meaningful results Additionally
laying the groundwork for modeling the similarities between songs and identifying
groundbreaking ones is the same regardless of the songs included and the following
methodologies can be implemented on any similarly-formatted dataset including one
with songs that might currently be missing in the MSD
11
Chapter 2
Mathematical Modeling
21 Determining Novelty of Songs
Finding an logical and implementable mathematical model was and continues to be
an important aspect of my research My problem how to mathematically determine
which songs were unique for their time requires an algorithm in which each song is
introduced in chronological order either joining an existing category or starting a
new category based on its musical similarity to songs already introduced Clustering
algorithms like k-means or Gaussian Mixture Models (GMM) which have a prede-
termined number of clusters and optimize the partitioning of a dataset into those
cluster assume a fixed number of clusters While this process would work if we knew
exactly how many genres of EM existed if we guess wrong our end results may end
up with clusters that are wrongly grouped together or separated It is much better to
apply a clustering algorithm that does not make any assumptions about this number
One particularly promising process that addresses the issue of tthe number of
clusters is a family of algorithms known as Dirichlet Proccesses (DPs) DPs are
useful for this particular application because (1) they assign clusters to a dataset
12
with only an upper bound on the number of clusters and (2) by sorting the songs
in chronological order before running the algorithm and keeping track of which
songs are categorized under each cluster we can observe the earliest songs in each
cluster and consequentially infer which songs were responsible for creating new
clusters The arguments for the DP The DP is controlled by a parameter α which
is the concentration parameter The expected number of clusters formed is directly
proportional to the value of α so the higher the value of α the more likely new
clusters will be formed [10] Regardless of the value of α as the number of data
points introduced increases the probability of a new group being formed decreases
That is a ldquorich get richerrdquo policy is in place and existing clusters tend to grow in
size Tweaking the value of the tunable parameter α is an important part of the
study since it determines the flexibility given to forming a new cluster If the value
of α is too small then the criteria for forming clusters will be too strict and data
that should be in different clusters will be assigned to the same cluster On the other
hand if α is too large the algorithm will be too sensitive and assign similar songs to
different clusters
The implementation of the DP was achieved using scikit-learnrsquos library and API for
Dirichlet Process Gaussian Mixture Model (DPGMM) The DPGMM is the formal
name of the Dirichlet Process model used to cluster the data More specifically
scikit-learnrsquos implementation of the DPGMM uses the Stick Breaking method
one of several equally valid methods to assign songs to clusters [11] While the
mathematical details for this algorithm can be found at the following citation [12]
the most important aspects of the DPGMM are the arguments that the user can
specify and tune The first of these tunable parameters is the value α which is the
same parameter as the α discussed in the previous paragraph As seen in Figure 21
on the right side properly tuning α is key to obtaining meaningful clusters The
13
center image has α set to 001 which is too small and results in all of the data being
formed under one cluster On the other hand the bottom-right image has the same
data set and α set to 100 which does a better job of clustering On a related note
the figure also demonstrates the effectiveness of the DPGMM over the GMM On the
left side clearly the dataset contains 2 clusters but the GMM on the top-left image
assumes 5 clusters as a prior and consequentially clusters the data incorrectly while
the DPGMM manages to limit the data to 2 clusters
The second argument that the user inputs for the DPGMM is the data that
will be clustered The scikit-learn implementation takes the data in the format
of a nested list (N lists each of length m) where N is the number of data points
and m the number of features While the format of the data structure is relatively
straightforward choosing which numbers should be in the data was a challenge I
faced Selecting the relevant features of each song to be used in the algorithm will
be expounded upon in the next section ldquoFeature Selectionrdquo
The last argument that a user inputs for the scikit-learn DGPMM implementa-
tion is an argument indicating the upper bound for the number of clusters The
Dirichlet Process then determines the best number of clusters for the data between
1 and the upper bound Since the DPGMM is flexible enough to find the best value
I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to
modify the number of clusters formed
22 Feature Selection
One of the most difficult aspects of the Dirichlet method is choosing the features
to be used for clustering In other words when we organize the songs into clusters
14
Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α
we need to ensure that each cluster is distinct in a way that is statistically and
intuitively logical In the Million Song Dataset [9] each song is represented as a
JSON object containing several fields These fields are candidate features to be used
in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo
by Rick Astley and the corresponding features
artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID
for this artists is db9)
artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)
artist_mbtags_count shape = (4)
(raw tag count of the 4 tags this artist received on musicbrainzorg)
artist_name Rick Astley (artist name)
artist_playmeid 1338 (the ID of that artist on the service playmecom)
artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)
artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest
(number between 0 and 1))
artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest
(number between 0 and 1))
15
audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for
the analysis by The Echo Nest)
bars_confidence shape = (99) (confidence value (between 0 and 1) associated
with each bar by The Echo Nest)
bars_start shape = (99) (start time of each bar according to The Echo Nest this
song has 99 bars)
beats_confidence shape = (397))
confidence value (between 0 and 1) associated with each beat by The Echo Nest
beats_start shape = (397) (start time of each beat according to The Echo Nest
this song has 397 beats)
danceability 00 (danceability measure of this song according to The Echo Nest
(between 0 and 1 0 =gt not analyzed))
duration 21169587 (duration of the track in seconds)
end_of_fade_in 0139 (time of the end of the fade in at the beginning of the
song according to The Echo Nest)
energy 00 (energy measure (not in the signal processing sense) according to The
Echo Nest (between 0 and 1 0 = not analyzed))
key 1 (estimation of the key the song is in by The Echo Nest)
key_confidence 0324 (confidence of the key estimation)
loudness -775 (general loudness of the track)
mode 1 (estimation of the mode the song is in by The Echo Nest)
mode_confidence 0434 (confidence of the mode estimation)
release Big Tunes - Back 2 The 80s (album name from which the track was taken
some songs tracks can come from many albums we give only one)
release_7digitalid 786795 (the ID of the release (album) on the service 7digi-
talcom)
sections_confidence shape = (10) (confidence value (between 0 and 1) associated
16
with each section by The Echo Nest)
sections_start shape = (10) (start time of each section according to The Echo
Nest this song has 10 sections)
segments_confidence shape = (935) (confidence value (between 0 and 1) asso-
ciated with each segment by The Echo Nest)
segments_loudness_max shape = (935) (max loudness during each segment)
segments_loudness_max_time shape = (935) (time of the max loudness
during each segment)
segments_loudness_start shape = (935) (loudness at the beginning of each
segment)
segments_pitches shape = (935 12) (chroma features for each segment (normal-
ized so max is 1))
segments_start shape = (935) (start time of each segment ( musical event or
onset) according to The Echo Nest this song has 935 segments)
segments_timbre shape = (935 12) (MFCC-like features for each segment)
similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar
to Rick Astley according to The Echo Nest)
song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded
(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))
song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can
be associated with many tracks (with very slight audio differences))
start_of _fade _out 198536 (start time of the fade out in seconds at the end
of the song according to The Echo Nest)
tatums_confidence shape = (794) (confidence value (between 0 and 1) associated
with each tatum by The Echo Nest)
tatums_start shape = (794) (start time of each tatum according to The Echo
Nest this song has 794 tatums)
17
tempo 113359 (tempo in BPM according to The Echo Nest)
time_signature 4 (time signature of the song according to The Echo Nest ie
usual number of beats per bar)
time_signature_confidence 0634 (confidence of the time signature estimation)
title Never Gonna Give You Up (song title)
track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)
track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track
on which the analysis was done) year 1987 (year when this song was released
according to musicbrainzorg)
When choosing features my main goal was to use features that would most
likely yield meaningful results yet also be simple and make sense to the average
person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener
will have his or her opinions to what constitutes different types of music but some
common features most people tend to differentiate songs by are pitch rhythm and
the types of instruments used The following specific fields provided in each song
object fall under these three terms
Pitch
bull segments_pitches a matrix of values indicating the strength of each pitch (or
note) at each discernible time interval
Rhythm
bull beats_start a vector of values indicating the start time of each beat
bull time_signature the time signature of the song
bull tempo the speed of the song in Beats Per Minute (BPM)
Instruments
18
bull segments_timbre a matrix of values indicating the distribution of MFCC-like
features (different types of tones) for each segments
The segments_pitches feature is a clear candidate for a differentiating factor for
songs since it reveals patterns of notes that occur Additionally other research
papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a
procedure that allows all songs to be compared with the same metric Likewise
timbre is intuitively a reliable differentiating feature since it reveals the amount
that different tones or sounds that sound different despite having the same pitch
Therefore segments_timbre is another feature that is considered in each song
Finally we look at the candidate features for rhythm At first glance all of these
features appear to be useful as they indicate the rhythm of a song in one way or
another However none of these features are as useful as the pitch and timbre
features While tempo is one factor in differentiating genres of EDM and music in
general tempo alone is not a driving force of musical innovation Certain genres
of EDM like drum nrsquo bass and happycore stand out for having very fast tempos
but the tempo is supplemented with a sound unique to the genre Conceiving new
arrangements of pitches combining instruments in new ways and inventing new
types of sounds are novel but speeding up or slowing down existing sounds is not
Including tempo as a feature could actually add noise to the model since many genres
overlap in their tempos And finally tempo is measured indirectly when the pitch
and timbre features are normalized for each song everything is measured in units of
ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features
each second Time signature can be dismissed from the candidates features for the
same reason as tempo many genres contain the same time signature and including
it in the feature set would only add more noise beats_start looks like a more
promising feature since like segments_pitches and segments_timbre it consists of
a vector of values However difficulties arise when we begin to think how exactly
19
we can utilize this information Since each song varies in length we need a way to
compare songs of different durations on the same level One approach could be to
perform basic statistics on the distance between each beat for example calculating
the mean and standard deviation of this distance However the normalized pitch
and timbre information already capture this data Another possibility is detecting
certain patterns of beats which could differentiate the syncopated dubstep or glitch
music beats from the steady pulse of electro-house But once again every beat is
accompanied by a sound with a specific timbre and pitch so this feature would not
add any significantly new information
23 Collecting Data and Preprocessing Selected Fea-
tures
231 Collecting the Data
Upon deciding the features I wanted to use in my research I first needed to collect
all of the electronic songs in the Million Song Dataset The easiest reliable way to
achieve this was to iterate through each song in the database and save the information
for the songs where any of the artist genre tags in artist_mbtags matched with an
electronic music genre While this measure was not fully accurate because it looks at
the genre of the artist not the song specific genre information for each song was not
as easily accessible so this indicator was nearly as good a substitute To generate a
list of the genres that electronic songs would fall under I manually searched through
a subset of the MSD to find all genres that seemed to be releated to electronic music
In the case of genres that were sometimes but not always electronic in nature such
20
as disco or pop I erred on the side of caution and did not include them in the list
of electronic genres In these cases false positives such as primarily rock songs that
happen to have the disco label attached to the artist could inadvertantly be included
in the dataset The final list of genres is as follows
target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo
rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo
rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo
rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
rsquodance and electronicarsquorsquoelectronicrsquo]
232 Pitch Preprocessing
A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-
cally informed manner The study first takes the raw sound data and converts it into
a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest
amount Then it computes the most likely chord by comparing comparing the 4 most
common types of chords in popular music (major minor dominant 7 and minor 7) to
the observed chord The most common chords are represented as ldquotemplate chordsrdquo
and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For
example using the note C as the first index the C major chord is represented as
CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)
For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is
computed over every template chord
ρCTc =12sumi=1
(CT minus CTi)(ci minus c)σCTσc
21
where CT is the mean of the values in the template chord σCT is the standard
deviation of the values in the chord and the operations on c are analogous Note
that the summation is over each individual pitch in the 12 pitch classes The chord
template with the highest value of ρ is selected as the chord for the time frame
After this is performed for each time frame the values are smoothed and then the
change between adjacent chords is observed The reasoning behind this step is that
by measuring the relative distance between chords rather than the chords themselves
all songs can be compared in the same manner even though they may have different
key signatures Finally the study takes the types of chord changes and classifies
them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted
versions of the chord changes that make more sense to a human such as ldquochanges
involving dominant 7th chordsrdquo
In my preliminary implementation of this method on an electronic dance music
corpus I made a few modifications to Mauchrsquos study First I smoothed out time
frames before computing the most probable chords rather than smoothing the most
probable chords I did this to save time and to reduce volatility in the chord
measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference
which contains 935 time frames and lasts 212 seconds 5 time frames is slightly
under 1 second and for preliminary testing appeared to be a good interval for each
time block Second as mentioned in the literature section I did not abstract the
chord changes into H-topics This decision also stemmed from time constraints since
deriving semantic chord meaning from EDM songs would require careful research
into the types of harmonies and sounds common in that genre of music Below I
included a high-level visualization of the pitch metadata found in a sample song
ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change
vector that I could then feed into the Dirichlet Process algorithm
22
Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813
CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513
DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913
FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713
GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213
AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513
13 13 13 13
060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013
13 13 13 13 13 13 13 13 13 13 13 13
13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13
Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13
Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13
Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13
Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13
23
13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13
13
13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13
Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13
24
A final step I took to normalize the chord change data was to divide the numbers by
the length of the song so that each songrsquos number of chord changes was measured per
second
233 Timbre Preprocessing
For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre
uniformly across all songs [8] After collecting all song metadata I took a random
sample of 20 songs from each year starting at 1970 The reason I forced the sampling
to 20 randomly sampled songs from each year and did not take a random sample of
songs from all years at once was to prevent bias towards any type of sounds As seen
in figure 22 there are significantly more songs from 2000-2011 than before 2000 The
mean year is x = 2001052 the median year is 2003 and the standard deviation of the
years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include
a disproportionate amount of more recent songs In order to not miss out on sounds
that may be more prevalent in older songs I required a set number of songs from each
year Next from each randomly selected song I selected 20 random timbre frames
in order to prevent any biases in data collection within each song In total there
were 422020 = 16800 timbre frames collected Next I clustered the timbre frames
using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to
100 and selecting the number of clusters with the lowest Bayes Information Criterion
(BIC) a statistical measure commonly used to calculate the best fitting model The
BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters
and saved the mean values of each of the 12 timbre segments for each cluster formed
In the same way that every song had the same 192 cord changes whose frequencies
could be compared between songs each song now had the same 46 timbre clusters
but different frequencies in each song When reading in the metadata from each song
I calculated the most likely timbre cluster each timbre frame belonged to and kept
25
Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear
a frequency count of all of the possible timbre clusters observed in a song Finally
as with the pitch data I divided all observed counts by the duration of the song in
order to normalize each songrsquos timbre counts
26
Chapter 3
Results
31 Methodology
After the pitch and timbre data was processed I ran the Dirichlet Process on the
data For each song I concatenated the 192-element chord change frequency list
and the 46-element timbre category frequency list giving each song a total of 238
features However there is a problem with this setup The pitch data will inherently
dominate the clustering process since it contains almost 3 times as many features
as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to
give different weights to each feature I considered another possibility to remedy
this discrepancy duplicating the timbre vector a certain number of times and
concatenating that to the feature set of each song While this strategy runs the risk
of corrupting the feature set and turning it into something that does not accurately
represent each song it is important to keep in mind that even without duplicating
the timbre vector the feature set consists of two separate feature sets concatenated
to each other Therefore timbre duplication appears to be a reasonable strategy to
weigh pitch and timbre more evenly
27
After this modification I tweaked a few more parameters before obtaining my
final results Dividing the pitch and timbre frequencies by the duration of the song
normalized every song to frequency per second but it also had the undesired effect
of making the data too small Timbre and pitch frequencies per second were almost
always less than 10 and many times hovered as low as 0002 for nonzero values
Because all of the values were very close to each other using common values of α
in the range of 01 to 1000-2000 was insufficient to push the songs into different
clusters As a result every song fell into the same cluster Increasing the value
of α by several orders of magnitude to well over 10 million fixed the problem but
this solution presented two problems First tuning α to experiment with different
ways to cluster the music would be problematic since I would have to work with
an enormous range of possible values for α Second pushing α to such high values
is not appropriate for the Dirichlet Process Extremely high values of α indicate a
Dirichlet Process that will try to disperse the data into different clusters but a value
of α that high is in principle always assigning each new song to a new cluster On
the other hand varying α between 01 and 1000 for example presents a much wider
range of flexibility when assigning clusters While this may be possible by varying
the values of α an extreme amount with the data as it currently is we are using
the Dirichlet Process in a way it should mathematically not be used Therefore
multiplying all of the data by a constant value so that we can work in the appropriate
range of α is the ideal approach After some experimentation I found that k=10 was
an appropriate scaling factor After initial runs of the Dirichlet Process I found out
that there was a slight issue with some of the earlier songs Since I had only artist
genre tags not specific song tags for each song I chose songs based on whether any
of the tags associated with the artist fell under any electronic music genre including
the generic term rsquoelectronicrsquo There were some bands mostly older ones from the
1960s and 1970s like Electric Light Orchestra which had some electronic music but
28
mostly featured rock funk disco or another genre Given that these artists featured
mostly non-electronic songs I decided to exclude them from my study and generate
a blacklist indicating these music artists While it was infeasible to look through
every single song and determine whether it was electronic or not I was able to look
over the earliest songs in each cluster These songs were the most important to verify
as electronic because early non-electronic songs could end up forming new clusters
and inadvertently create clusters with non-electronic sounds that I was not looking for
The goal of this thesis is to identify different groups in which EM songs are
clustered and identify the most unique artists and genres While the second task is
very simple because it requires looking at the earliest songs in each cluster the first
is difficult to gauge the effectiveness of While I can look at the average chord change
and timbre category frequencies in each category as well as other metadata putting
more semantic interpretations to what the music actually sounds like and determining
whether the music is clustered properly is a very subjective process For this reason
I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and
compared the clustering in each category examining similarities and differences in
the clusters formed in each scenario in the Discussion section For each value α I
set the upper limit of components or clusters allowed to 50 The ranges of α I used
resulted in 9 14 and 19 clusters formed
32 Findings
321 α=005
When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are
the distribution of years of the songs in each cluster (note that the Dirichlet Process
29
does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are
skipped)
30
Figure 31 Song year distributions for α = 005
For each value of α I also calculated the average frequency of each chord change
category and timbre category for each cluster and plotted the results The green
lines correspond to timbre and the blue lines to pitch
31
Figure 32 Timbre and pitch distributions for α = 005
A table of each cluster formed the number of songs in that cluster and descriptions
of pitch timbre and rhythmic qualities characteristic of songs in that cluster are
shown below
32
Cluster Song Count Characteristic Sounds
0 6481 Minimalist industrial space sounds dissonant chords
1 5482 Soft New Age ethereal
2 2405 Defined sounds electronic and non-electronic instru-
ments played in standard rock rhythms
3 360 Very dense and complex synths slightly darker tone
4 4550 Heavily distorted rock and synthesizer
6 2854 Faster paced 80s synth rock acid house
8 798 Aggressive beats dense house music
9 1464 Ambient house trancelike strong beats mysterious
tone
11 1597 Melancholy tones New wave rock in 80s then starting
in 90s downtempo trip-hop nu-metal
Table 31 Song cluster descriptions for α = 005
322 α=01
A total of 14 clusters were formed (16 were formed but 2 clusters contained only
one song each I listened to both of these songs and they did not sound unique so
I discarded them from the clusters) Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
33
34
Figure 33 Song year distributions for α = 01
35
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
I hereby declare that I am the sole author of this thesis
I authorize Princeton University to lend this thesis to other institutions or
individuals for the purpose of scholarly research
Matthew Silver
I further authorize Princeton University to reproduce this thesis by photocopying or
by other means in total or in part at the request of other institutions or individuals
for the purpose of scholarly research
Matthew Silver
Abstract
This thesis provides a foundation of code and models to mathematically analyze
the evolution of Electronic Music (EM) over time Using chronologically ordered
data from the Million Song Dataset it utilizes a Dirichlet Process Gaussian Mixture
Model to assign the songs to clusters based on pitch and timbre data and without
any previous assumptions of the clusters beforehand By examining the characteristic
sounds of songs in each cluster the following conclusions are reached
1 Which artists and songs were most innovative for their time
2 Potential new ways in which the genealogy of and relations between EM genres
can be imagined
Finally this thesis evaluates the strengths and weaknesses of the model used and
suggests future work that can be done to improve upon it
iii
Acknowledgements
I would like to thank Professor Ramon Van Handel for advising me on my thesis
You helped me figure out how to narrow down my goals into a concrete topic and
provided useful input on how to model and frame my problem effectively I would
also like to thank the Princeton ORFE department for providing funding to download
and manage the dataset I used for this project Michael Bino and the Computational
Science and Engineering Support group (CSES) were incredibly useful for helping me
set up and run my programs on the Princeton servers Without your help I would
have had a much harder time getting my 300GB dataset of music to play nice I
would also like to thank Jeffrey Scott Dwoskin for providing the Latex template from
which I wrote this thesis And finally I would like to thank my family and friends
especially Lucas and Kathryn for providing continuous support and feedback The
work we all poured into our theses is incredible and wersquove made it through this
sometimes rocky journey in the greatest university of all
On a personal note regardless of whether you are a current Princeton under-
graduate or are just interested in my work push yourself beyond your comfort zone
and donrsquot let grades or other peoplersquos opinions get in your way Take classes and
join new groups that reflect your passions At the same time love yourself Take
care of your body and have some fun without feeling guilty And finally form great
relationships While Princetonians sometimes appear hypercompetitive and forced
they are genuinely sweet and brilliant people who you will treasure for life These
four years at Princeton have gone by in a flash and in the whirlwind of highs and
lows Irsquove gone through these are the most important lessons Irsquove learned
iv
To my parents
v
Contents
Abstract iii
Acknowledgements iv
List of Tables viii
List of Figures ix
1 Introduction 1
11 Background Information 1
12 Literature Review 3
13 The Dataset 10
2 Mathematical Modeling 12
21 Determining Novelty of Songs 12
22 Feature Selection 14
23 Collecting Data and Preprocessing Selected Features 20
231 Collecting the Data 20
232 Pitch Preprocessing 21
233 Timbre Preprocessing 25
3 Results 27
31 Methodology 27
32 Findings 29
321 α=005 29
vi
322 α=01 33
323 α=02 38
33 Analysis 46
4 Conclusion 53
41 Design Flaws in Experiment 53
42 Future Work 55
43 Closing Remarks 56
A Code 57
A1 Pulling Data from the Million Song Dataset 57
A2 Calculating Most Likely Chords and Timbre Categories 58
A3 Code to Compute Timbre Categories 60
A4 Helper Methods for Calculations 61
Bibliography 68
vii
List of Tables
31 Song cluster descriptions for α = 005 33
32 Song cluster descriptions for α = 01 38
33 Song cluster descriptions for α = 02 45
viii
List of Figures
11 A userrsquos taste profile generated by Spotify 4
12 Data processing pipeline for Mauchrsquos study illustrated with a segment
of Queenrsquos Bohemian Rhapsody 1975 8
21 scikit-learn example of GMM vs DPGMM and tuning of α 15
22 Number of Electronic Music Songs in Million Song Dataset from Each
Year 26
31 Song year distributions for α = 005 31
32 Timbre and pitch distributions for α = 005 32
33 Song year distributions for α = 01 35
34 Timbre and pitch distributions for α = 01 37
35 Song year distributions for α = 02 41
36 Timbre and pitch distributions for α = 02 44
ix
Chapter 1
Introduction
11 Background Information
Electronic Music (EM) is an increasingly popular genre of music with an immense
presence and influence on modern culture Because the genre is new as a whole and
is arguably more loosely structured than other genres - technology has enabled the
creation of a wide range of sounds and easy blending of existing and new sounds alike
- formal analysis especially mathematical analysis on the genre is fairly limited and
has only begun growing in the past few years As a fan of EM I am interested in
exploring how the genre has evolved over time More specifically my goal with this
project was to design some structure or model that could help me identify which EM
artists have contributed the most stylistically to the genre Oftentimes famous EM
artists do not create novel-sounding music but rather popularize an existing style
and the motivation of this study is to understand who has stylistically contributed
the most to the EM scene versus those who have merely popularized aspects of it
As the study progressed the manner in which I constructed my model lent to
a second goal of the thesis imagining new ways in which we can imagine EM genres
1
While there exists an extensive amount of research analyzing music trends from
a non-mathematical (cultural societal artistic) perspective the analysis of EM
from a mathematical perspective and especially with respect to any computationally
measurable trends in the genre is close to nonexistent EM has been analyzed to a
lesser extent than other common genres of music in the academic world most likely
due to existing for a shorter amount of time and being less rooted in prominant
social and cultural events In fact the first published reference work on EM did not
exist until 2012 when Professor Mark J Butler from Northwestern University edited
and published Electronica Dance and Club Music a collection of essays exploring
EM genres and culture [1] Furthermore there are very few comprehensive visual
guides that allow a user to relate every genre to each other and easily observe how
different genres converge and diverge While conducting research the best guide I
found was not a scholarly source but an online guide created by an EM enthusiast
Ishkurrsquos Guide to Electronic Music [2] This guide which includes over 100 specific
genres grouped by more general genres and represents chronological evolutions by
connecting each genre in a flowchart is the most exhaustive analysis of the EM scene
I could find However the guidersquos analysis is very qualitative While each subgenre
contains an explanation on typical rhythm and sounds and includes well-known
songs indicative of the style the guide is created by someone who used historical and
personal knowledge of EM My model which creates music genres by chronologically
ordering songs and then assigning them to clusters is a different approach towards
imagining the entire landscape of EM The results may confirm Ishkurrsquos Guidersquos
findings in which case his guide is given additional merit with mathematical evi-
dence or it may be different suggesting that there may be better ways to group EM
genres One advantage that guides such as Ishkurrsquos and historically-based scholarly
works have over my approach is that those models are history-sensitive and therefore
may group songs in a way that historically makes sense On the other hand my
2
model is history-agnostic and may not realize the historical context of songs when
clustering However I believe that there is still significant merit to my research
Instead of classifying genres of music by early genres that led to them my approach
gives the most credit to the artists and songs that were the most innovative for their
time and perhaps reveal different musical styles that are more similar to each other
than history would otherwise imply This way of thinking of music genres while
unconventional is another way of imagining EM
The practice of quantitatively analyzing music has exploded in the last decade
thanks to technological and algorithmic advances that allow data scientists to con-
structively sift through troves of music and listener information In the literature
review I will focus on two particular organizations that have contributed greatly to
the large-scale mathematical analysis of music Pandora a website that plays songs
similar to a songartistalbum inputted by a user and Echo Nest a music analytics
firm that was acquired by Spotify in 2014 and drives Spotifyrsquos Discover Weekly
feature [3] After evaluating the relevance of these sources to my thesis work I will
then look over the relevant academic research and evaluate what this research can
contribute
12 Literature Review
The analysis of quantitative music generally falls into two categories research con-
ducted by academics and academic organizations for scholarly purposes and research
conducted by companies and primarily targeted for consumers First looking at the
consumer-based research Spotify and Pandora are two of the most prominent based
groups and the two I decided to focus on Spotify is a music streaming service where
users can listen to albums and songs from a wide variety of artists or listen to weekly
3
playlists generated based on the music the user and userrsquos friends have listened to
The weekly playlist called Discover Weekly Playlist is a relatively new feature in
Spotify and is driven by music analysis algorithms created from Echo Nest Using
the Echo Nest code interface Spotify creates a ldquotaste profilerdquo for each user which
assesses attributes such as how often a user branches out to new styles of music how
closely the userrsquos music streamed follows popular Billboard music charts and so on
Spotify also looks at the artists and songs the user streamed and creates clusters
of different genres that the user likes (see figure 11) The taste profile and music
clusters can then be used to generate playlists geared to a specific user The genres
in the cluster come from a list of nearly 800 names which are derived by scraping
the Internet for trending terms in music as well as training various algorithms on a
regular basic by ldquolisteningrdquo to new songs [4][5]
Figure 11 A userrsquos taste profile generated by Spotify
4
Although Spotify and Echo Nestrsquos algorithms are very useful for mapping the land-
scape of established and emerging genres of music the methodology is limited to
pre-defined genres of music This may serve as a good starting point to compare my
final results to but my study aims to be as context-free as possible by attaching no
preconceived notions of music styles or genres instead looking at features that could
be measured in every song
While Spotifyrsquos approach to mapping music is very high-tech and based on ex-
isting genres Pandora takes a very low-tech and context-free approach to music
clustering Pandora created the Music Genome Project a multi-year undertaking
where skilled music theorists listened to a large number of songs and analyzed up to
450 characteristics in each song [6] Pandorarsquos approach is appealing to the aim of
my study since it does not take any preconceived notions of what a genre of music
is instead comparing songs on common characteristics such as pitch rhythm and
instrument patterns Unfortunately I do not have a cadre of skilled music theorists
at my disposal nor do I have 10 years to perform such calculations like the dedicated
workers at Pandora (tips the indestructible fedora) Additionally Pandorarsquos Music
Genome Project is intellectual property so at best I can only rely on the abstract
concepts of the Music Genome Project to drive my study
In the academic realm there are no existing studies analyzing quantifiable changes in
EM specifically but there exist a few studies that perform such analysis on popular
Western music in general One such study is Measuring the Evolution of Contem-
porary Western Popular Music which analyzes music from 1955-2010 spanning all
common genres Using the Million Song Dataset a free public database of songs
each containing metadata (see section 13) the study focuses on the attributes pitch
timbre and loudness Pitch is defined as the standard musical notes or frequency of
5
the sound waves Timbre is formally defined as the Mel frequency cepstral coefficients
(MFCC) of a transformed sound signal More informally it refers to the sound color
texture or tone quality and is associated with instrument types recording resources
and production techniques In other words two sounds that have the same pitch
but different tones (for example a bell and voice) are differentiated by their timbres
There are 12 MFCCs that define the timbre of a given sound Finally loudness
refers to intrinsically how loud the music sounds not loudness that a listener can
manipulate while listening to the music Loudness is the first MFCC of the timbre
of a sound [7] The study concluded that over time music has been becoming louder
and less diverse
The restriction of pitch sequences (with metrics showing less variety inpitch progressions) the homogenization of the timbral palette (with fre-quent timbres becoming more frequent) and growing average loudnesslevels (threatening a dynamic richness that has been conserved until to-day) This suggests that our perception of the new would be essentiallyrooted on identifying simpler pitch sequences fashionable timbral mix-tures and louder volumes Hence an old tune with slightly simpler chordprogressions new instrument sonorities that were in agreement with cur-rent tendencies and recorded with modern techniques that allowed forincreased loudness levels could be easily perceived as novel fashionableand groundbreaking
This study serves as a good starting point for mathematically analyzing music in
a few ways First it utilizes the Million Song Dataset which addresses the issue
of legally obtaining music metadata As mentioned in section 13 the only legal
way to obtain playable music for this study would have been to purchase all songs I
would include which is infeasible While the Million Song Dataset does not contain
the audio files in playable format it does contain audio features and metadata that
allow for in-depth analysis In addition working with the dataset takes out the
work of extracting features from raw audio files saving an extensive amount of time
and energy Second the study establishes specifics for what constitutes a trend
in music Pitch timbre and loudness are core features of music and examining the6
distributions of each among songs over time reveals a lot of information about how
the music industry and consumersrsquo tastes have evolved While these are not all of the
features contained in a song they serve as a good starting point Third the study
defines mathematical ways to capture music attributes and measure their change
over time For example pitches are transposed into the same tonal context with
binary discretized pitch descriptions based on a threshold so that each song can be
represented with vectors of pitches that are normalized and compared to other songs
While this study lays some solid groundwork for capturing and analyzing nu-
meric qualities of music it falls short of addressing my goals in a couple of ways
First it does not perform any analysis with respect to music genre While the
analysis performed in this paper could easily be applied to a list of songs in a specific
genre certain genres might have unique sounds and rhythms relative to other genres
that would be worth studying in greater detail Second the study only measures
general trends in music over time The models used to describe changes are simple
regressions that donrsquot look at more nuanced changes For example what styles of
music developed over certain periods of time How rapid were those changes Which
styles of music developed from which other styles
A more promising study led by music researcher Matthias Mauch [8] analyzes
contemporary popular Western Music from the 1960s to 2010s by comparing numer-
ical data on the pitches and timbre of a corpus of 17000 songs that appeared on the
Billboard Hot 100 Like the previously mentioned paper Measuring the Evolution
of Contemporary Western Popular Music Mauchrsquos study also creates abstractions
of pitch and timbre in order to provide a consistent and meaningful semantic inter-
pretation of musical data (see figure 12) However Mauchrsquos study takes this idea a
step further by using genre tags from Lastfm a music website and constructing a
7
hierarchy of music genres using hierarchical clustering Additionally the study takes
a crack at determining whether a particular band the Beatles was musically ground-
breaking for its time or merely playing off sounds that other bands had already used
Figure 12 Data processing pipeline for Mauchrsquos study illustrated with a segment ofQueenrsquos Bohemian Rhapsody 1975
While both Measuring the Evolution of Contemporary Western Popular Music
and Mauchrsquos study created abstractions of pitch and timbre Mauchrsquos study is more
appealing with respect to my goal because its end results align more closely with
mine Additionally the data processing pipeline offers several layers of abstraction
8
and depending on my progress I would be able to achieve at least one of the levels of
abstraction As shown in figure 12 each segment of a raw audio file is first broken
down into its 12 timbre MFCCs and pitch components Next the study constructs
ldquolexiconsrdquo or a dictionary of pitch and timbre terms that all songs can be compared
to For pitch the original data is in a N-by-12 matrix where N is the number of time
segments in the song and 12 the number of each of the notes found in an octave of
pitches Each time segment contains the relative strengths of each of the 12 pitches
However music sounds are not merely a collection of pitches but more precisely
chords Furthermore the similarity of two songs is not determined by the absolute
pitches of their chords but rather the progression of chords in the song all relative to
each other For example if all the notes in a song are transposed by one step the song
will sound different in terms of absolute pitch but the song will still be recognized
as the original because all of the relative movements from each chord to the next
are the same This phenomenon is captured in the pitch data by finding the most
likely chord played at each time segment then counting the change to the next chord
at each time step and generating a table of chord change frequencies for each song
Constructing the timbre lexcion is more complicated since there is no easy analogue
like chords for pitches to compare songs Mauchrsquos study utilizes a Gaussian Mixture
Model (GMM) by iterating over k=1 to k=N clusters where N is a large number
running the GMM on each prior assumption of k clusters and computing the Bayes
Information Criterion (BIC) for each model The lowest of the N BIC values is found
and that value of k is selected That model contains k different timbre clusters
and each cluster contains the mean timbre value for each of the 12 timbre components
For my research I decided that the pitch and timbre lexicons would be the most
realistic level of abstraction I could obtain Mauchrsquos study adds an addtional layer
to pitch and timbre by identifying the most common patterns of chord changes and
9
most common timbre rhythms and creating more general tags from these combined
terms such as ldquo stepwise changes indicating modal harmonyrdquo for a pitch topic and
ldquooh rounded mellowrdquo for a timbral topic There were two problems with using this
final layer of abstraction for my study First attaching semantic interpretations to
the pitch and timbral lexicons is a difficult task For timbre I would need to listen
to sound samples containing all of the different timbral categories I identified and
attaching user interpretations to them For the chords not only would I have to
perform the same analysis as on timbre but take careful attention to identify which
chords correspond to common sound progressions in popular music a task that I am
not qualified for an did not have the resources for this thesis to seek out Second
this final layer of abstraction was not necessary for the end goal of my paper In
fact consolidating my pitch and timbre lexicons into simpler phrases would run the
risk of pigeonholing my analysis and preventing me from discovering more nuanced
patterns in my final results Therefore I decided to focus on pitch and timbral
lexicon construction as the furthest levels of abstraction when processing songs for
my thesis Mathematical details on how I constructed the lexical and timbral lexicons
can be found in the Mathematical Modeling section of this paper
13 The Dataset
In order to successfully execute my thesis I need access to an extensive database of
music Until recently acquiring a substantial corpus of music data was a difficult and
costly task It is illegal to download music audio files from video and music-sharing
sites such as YouTube Spotify and Pandora Some platforms such as iTunes offer
90-second previews of songs but using only segments of songs and usually segments
that showcase the chorus of the song are not reliable measures to capture the entire
essence of a song Even if I were to legally download entire audio files for free I would
10
run into additional issues Obtaining a high-quality corpora of song data would be
challenging writing scripts that crawl music sharing platforms may not capture all of
the music I am looking for And once I have the audio files I would have to perform
audio processing techniques to extract the relevant information from the songs
Fortunately there is an easy solution to the music data acquisition problem
The Million Song Dataset (MSD) is a collection of metadata for one million music
tracks dating up to 2011 Various organizations such as The Echo Nest Musicbrainz
7digital and Lastfm have contributed different pieces of metadata Each song is
represented as a Hierarchical Data Format file (HDF5) which can be loaded as a
JSON object The fields encompass topical features such as the song title artist
and release date as well as lower-level features such as the loudness starting beat
time pitches and timbre of several segments of the song [9] While the MSD is
the largest free and open source music metadata dataset I could find there is no
guarantee that it adequately covers the entire spectrum of EM artists and songs
This quality limitation is important to consider throughout the study A quick look
through the songs including the subset of data I worked with for this report showed
that there were several well-known artists and songs in the EM scene Therefore
while the MSD may not contain all desired songs for this project it contains an
adequate number of relevant songs to produce some meaningful results Additionally
laying the groundwork for modeling the similarities between songs and identifying
groundbreaking ones is the same regardless of the songs included and the following
methodologies can be implemented on any similarly-formatted dataset including one
with songs that might currently be missing in the MSD
11
Chapter 2
Mathematical Modeling
21 Determining Novelty of Songs
Finding an logical and implementable mathematical model was and continues to be
an important aspect of my research My problem how to mathematically determine
which songs were unique for their time requires an algorithm in which each song is
introduced in chronological order either joining an existing category or starting a
new category based on its musical similarity to songs already introduced Clustering
algorithms like k-means or Gaussian Mixture Models (GMM) which have a prede-
termined number of clusters and optimize the partitioning of a dataset into those
cluster assume a fixed number of clusters While this process would work if we knew
exactly how many genres of EM existed if we guess wrong our end results may end
up with clusters that are wrongly grouped together or separated It is much better to
apply a clustering algorithm that does not make any assumptions about this number
One particularly promising process that addresses the issue of tthe number of
clusters is a family of algorithms known as Dirichlet Proccesses (DPs) DPs are
useful for this particular application because (1) they assign clusters to a dataset
12
with only an upper bound on the number of clusters and (2) by sorting the songs
in chronological order before running the algorithm and keeping track of which
songs are categorized under each cluster we can observe the earliest songs in each
cluster and consequentially infer which songs were responsible for creating new
clusters The arguments for the DP The DP is controlled by a parameter α which
is the concentration parameter The expected number of clusters formed is directly
proportional to the value of α so the higher the value of α the more likely new
clusters will be formed [10] Regardless of the value of α as the number of data
points introduced increases the probability of a new group being formed decreases
That is a ldquorich get richerrdquo policy is in place and existing clusters tend to grow in
size Tweaking the value of the tunable parameter α is an important part of the
study since it determines the flexibility given to forming a new cluster If the value
of α is too small then the criteria for forming clusters will be too strict and data
that should be in different clusters will be assigned to the same cluster On the other
hand if α is too large the algorithm will be too sensitive and assign similar songs to
different clusters
The implementation of the DP was achieved using scikit-learnrsquos library and API for
Dirichlet Process Gaussian Mixture Model (DPGMM) The DPGMM is the formal
name of the Dirichlet Process model used to cluster the data More specifically
scikit-learnrsquos implementation of the DPGMM uses the Stick Breaking method
one of several equally valid methods to assign songs to clusters [11] While the
mathematical details for this algorithm can be found at the following citation [12]
the most important aspects of the DPGMM are the arguments that the user can
specify and tune The first of these tunable parameters is the value α which is the
same parameter as the α discussed in the previous paragraph As seen in Figure 21
on the right side properly tuning α is key to obtaining meaningful clusters The
13
center image has α set to 001 which is too small and results in all of the data being
formed under one cluster On the other hand the bottom-right image has the same
data set and α set to 100 which does a better job of clustering On a related note
the figure also demonstrates the effectiveness of the DPGMM over the GMM On the
left side clearly the dataset contains 2 clusters but the GMM on the top-left image
assumes 5 clusters as a prior and consequentially clusters the data incorrectly while
the DPGMM manages to limit the data to 2 clusters
The second argument that the user inputs for the DPGMM is the data that
will be clustered The scikit-learn implementation takes the data in the format
of a nested list (N lists each of length m) where N is the number of data points
and m the number of features While the format of the data structure is relatively
straightforward choosing which numbers should be in the data was a challenge I
faced Selecting the relevant features of each song to be used in the algorithm will
be expounded upon in the next section ldquoFeature Selectionrdquo
The last argument that a user inputs for the scikit-learn DGPMM implementa-
tion is an argument indicating the upper bound for the number of clusters The
Dirichlet Process then determines the best number of clusters for the data between
1 and the upper bound Since the DPGMM is flexible enough to find the best value
I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to
modify the number of clusters formed
22 Feature Selection
One of the most difficult aspects of the Dirichlet method is choosing the features
to be used for clustering In other words when we organize the songs into clusters
14
Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α
we need to ensure that each cluster is distinct in a way that is statistically and
intuitively logical In the Million Song Dataset [9] each song is represented as a
JSON object containing several fields These fields are candidate features to be used
in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo
by Rick Astley and the corresponding features
artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID
for this artists is db9)
artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)
artist_mbtags_count shape = (4)
(raw tag count of the 4 tags this artist received on musicbrainzorg)
artist_name Rick Astley (artist name)
artist_playmeid 1338 (the ID of that artist on the service playmecom)
artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)
artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest
(number between 0 and 1))
artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest
(number between 0 and 1))
15
audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for
the analysis by The Echo Nest)
bars_confidence shape = (99) (confidence value (between 0 and 1) associated
with each bar by The Echo Nest)
bars_start shape = (99) (start time of each bar according to The Echo Nest this
song has 99 bars)
beats_confidence shape = (397))
confidence value (between 0 and 1) associated with each beat by The Echo Nest
beats_start shape = (397) (start time of each beat according to The Echo Nest
this song has 397 beats)
danceability 00 (danceability measure of this song according to The Echo Nest
(between 0 and 1 0 =gt not analyzed))
duration 21169587 (duration of the track in seconds)
end_of_fade_in 0139 (time of the end of the fade in at the beginning of the
song according to The Echo Nest)
energy 00 (energy measure (not in the signal processing sense) according to The
Echo Nest (between 0 and 1 0 = not analyzed))
key 1 (estimation of the key the song is in by The Echo Nest)
key_confidence 0324 (confidence of the key estimation)
loudness -775 (general loudness of the track)
mode 1 (estimation of the mode the song is in by The Echo Nest)
mode_confidence 0434 (confidence of the mode estimation)
release Big Tunes - Back 2 The 80s (album name from which the track was taken
some songs tracks can come from many albums we give only one)
release_7digitalid 786795 (the ID of the release (album) on the service 7digi-
talcom)
sections_confidence shape = (10) (confidence value (between 0 and 1) associated
16
with each section by The Echo Nest)
sections_start shape = (10) (start time of each section according to The Echo
Nest this song has 10 sections)
segments_confidence shape = (935) (confidence value (between 0 and 1) asso-
ciated with each segment by The Echo Nest)
segments_loudness_max shape = (935) (max loudness during each segment)
segments_loudness_max_time shape = (935) (time of the max loudness
during each segment)
segments_loudness_start shape = (935) (loudness at the beginning of each
segment)
segments_pitches shape = (935 12) (chroma features for each segment (normal-
ized so max is 1))
segments_start shape = (935) (start time of each segment ( musical event or
onset) according to The Echo Nest this song has 935 segments)
segments_timbre shape = (935 12) (MFCC-like features for each segment)
similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar
to Rick Astley according to The Echo Nest)
song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded
(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))
song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can
be associated with many tracks (with very slight audio differences))
start_of _fade _out 198536 (start time of the fade out in seconds at the end
of the song according to The Echo Nest)
tatums_confidence shape = (794) (confidence value (between 0 and 1) associated
with each tatum by The Echo Nest)
tatums_start shape = (794) (start time of each tatum according to The Echo
Nest this song has 794 tatums)
17
tempo 113359 (tempo in BPM according to The Echo Nest)
time_signature 4 (time signature of the song according to The Echo Nest ie
usual number of beats per bar)
time_signature_confidence 0634 (confidence of the time signature estimation)
title Never Gonna Give You Up (song title)
track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)
track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track
on which the analysis was done) year 1987 (year when this song was released
according to musicbrainzorg)
When choosing features my main goal was to use features that would most
likely yield meaningful results yet also be simple and make sense to the average
person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener
will have his or her opinions to what constitutes different types of music but some
common features most people tend to differentiate songs by are pitch rhythm and
the types of instruments used The following specific fields provided in each song
object fall under these three terms
Pitch
bull segments_pitches a matrix of values indicating the strength of each pitch (or
note) at each discernible time interval
Rhythm
bull beats_start a vector of values indicating the start time of each beat
bull time_signature the time signature of the song
bull tempo the speed of the song in Beats Per Minute (BPM)
Instruments
18
bull segments_timbre a matrix of values indicating the distribution of MFCC-like
features (different types of tones) for each segments
The segments_pitches feature is a clear candidate for a differentiating factor for
songs since it reveals patterns of notes that occur Additionally other research
papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a
procedure that allows all songs to be compared with the same metric Likewise
timbre is intuitively a reliable differentiating feature since it reveals the amount
that different tones or sounds that sound different despite having the same pitch
Therefore segments_timbre is another feature that is considered in each song
Finally we look at the candidate features for rhythm At first glance all of these
features appear to be useful as they indicate the rhythm of a song in one way or
another However none of these features are as useful as the pitch and timbre
features While tempo is one factor in differentiating genres of EDM and music in
general tempo alone is not a driving force of musical innovation Certain genres
of EDM like drum nrsquo bass and happycore stand out for having very fast tempos
but the tempo is supplemented with a sound unique to the genre Conceiving new
arrangements of pitches combining instruments in new ways and inventing new
types of sounds are novel but speeding up or slowing down existing sounds is not
Including tempo as a feature could actually add noise to the model since many genres
overlap in their tempos And finally tempo is measured indirectly when the pitch
and timbre features are normalized for each song everything is measured in units of
ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features
each second Time signature can be dismissed from the candidates features for the
same reason as tempo many genres contain the same time signature and including
it in the feature set would only add more noise beats_start looks like a more
promising feature since like segments_pitches and segments_timbre it consists of
a vector of values However difficulties arise when we begin to think how exactly
19
we can utilize this information Since each song varies in length we need a way to
compare songs of different durations on the same level One approach could be to
perform basic statistics on the distance between each beat for example calculating
the mean and standard deviation of this distance However the normalized pitch
and timbre information already capture this data Another possibility is detecting
certain patterns of beats which could differentiate the syncopated dubstep or glitch
music beats from the steady pulse of electro-house But once again every beat is
accompanied by a sound with a specific timbre and pitch so this feature would not
add any significantly new information
23 Collecting Data and Preprocessing Selected Fea-
tures
231 Collecting the Data
Upon deciding the features I wanted to use in my research I first needed to collect
all of the electronic songs in the Million Song Dataset The easiest reliable way to
achieve this was to iterate through each song in the database and save the information
for the songs where any of the artist genre tags in artist_mbtags matched with an
electronic music genre While this measure was not fully accurate because it looks at
the genre of the artist not the song specific genre information for each song was not
as easily accessible so this indicator was nearly as good a substitute To generate a
list of the genres that electronic songs would fall under I manually searched through
a subset of the MSD to find all genres that seemed to be releated to electronic music
In the case of genres that were sometimes but not always electronic in nature such
20
as disco or pop I erred on the side of caution and did not include them in the list
of electronic genres In these cases false positives such as primarily rock songs that
happen to have the disco label attached to the artist could inadvertantly be included
in the dataset The final list of genres is as follows
target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo
rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo
rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo
rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
rsquodance and electronicarsquorsquoelectronicrsquo]
232 Pitch Preprocessing
A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-
cally informed manner The study first takes the raw sound data and converts it into
a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest
amount Then it computes the most likely chord by comparing comparing the 4 most
common types of chords in popular music (major minor dominant 7 and minor 7) to
the observed chord The most common chords are represented as ldquotemplate chordsrdquo
and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For
example using the note C as the first index the C major chord is represented as
CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)
For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is
computed over every template chord
ρCTc =12sumi=1
(CT minus CTi)(ci minus c)σCTσc
21
where CT is the mean of the values in the template chord σCT is the standard
deviation of the values in the chord and the operations on c are analogous Note
that the summation is over each individual pitch in the 12 pitch classes The chord
template with the highest value of ρ is selected as the chord for the time frame
After this is performed for each time frame the values are smoothed and then the
change between adjacent chords is observed The reasoning behind this step is that
by measuring the relative distance between chords rather than the chords themselves
all songs can be compared in the same manner even though they may have different
key signatures Finally the study takes the types of chord changes and classifies
them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted
versions of the chord changes that make more sense to a human such as ldquochanges
involving dominant 7th chordsrdquo
In my preliminary implementation of this method on an electronic dance music
corpus I made a few modifications to Mauchrsquos study First I smoothed out time
frames before computing the most probable chords rather than smoothing the most
probable chords I did this to save time and to reduce volatility in the chord
measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference
which contains 935 time frames and lasts 212 seconds 5 time frames is slightly
under 1 second and for preliminary testing appeared to be a good interval for each
time block Second as mentioned in the literature section I did not abstract the
chord changes into H-topics This decision also stemmed from time constraints since
deriving semantic chord meaning from EDM songs would require careful research
into the types of harmonies and sounds common in that genre of music Below I
included a high-level visualization of the pitch metadata found in a sample song
ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change
vector that I could then feed into the Dirichlet Process algorithm
22
Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813
CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513
DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913
FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713
GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213
AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513
13 13 13 13
060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013
13 13 13 13 13 13 13 13 13 13 13 13
13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13
Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13
Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13
Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13
Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13
23
13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13
13
13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13
Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13
24
A final step I took to normalize the chord change data was to divide the numbers by
the length of the song so that each songrsquos number of chord changes was measured per
second
233 Timbre Preprocessing
For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre
uniformly across all songs [8] After collecting all song metadata I took a random
sample of 20 songs from each year starting at 1970 The reason I forced the sampling
to 20 randomly sampled songs from each year and did not take a random sample of
songs from all years at once was to prevent bias towards any type of sounds As seen
in figure 22 there are significantly more songs from 2000-2011 than before 2000 The
mean year is x = 2001052 the median year is 2003 and the standard deviation of the
years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include
a disproportionate amount of more recent songs In order to not miss out on sounds
that may be more prevalent in older songs I required a set number of songs from each
year Next from each randomly selected song I selected 20 random timbre frames
in order to prevent any biases in data collection within each song In total there
were 422020 = 16800 timbre frames collected Next I clustered the timbre frames
using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to
100 and selecting the number of clusters with the lowest Bayes Information Criterion
(BIC) a statistical measure commonly used to calculate the best fitting model The
BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters
and saved the mean values of each of the 12 timbre segments for each cluster formed
In the same way that every song had the same 192 cord changes whose frequencies
could be compared between songs each song now had the same 46 timbre clusters
but different frequencies in each song When reading in the metadata from each song
I calculated the most likely timbre cluster each timbre frame belonged to and kept
25
Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear
a frequency count of all of the possible timbre clusters observed in a song Finally
as with the pitch data I divided all observed counts by the duration of the song in
order to normalize each songrsquos timbre counts
26
Chapter 3
Results
31 Methodology
After the pitch and timbre data was processed I ran the Dirichlet Process on the
data For each song I concatenated the 192-element chord change frequency list
and the 46-element timbre category frequency list giving each song a total of 238
features However there is a problem with this setup The pitch data will inherently
dominate the clustering process since it contains almost 3 times as many features
as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to
give different weights to each feature I considered another possibility to remedy
this discrepancy duplicating the timbre vector a certain number of times and
concatenating that to the feature set of each song While this strategy runs the risk
of corrupting the feature set and turning it into something that does not accurately
represent each song it is important to keep in mind that even without duplicating
the timbre vector the feature set consists of two separate feature sets concatenated
to each other Therefore timbre duplication appears to be a reasonable strategy to
weigh pitch and timbre more evenly
27
After this modification I tweaked a few more parameters before obtaining my
final results Dividing the pitch and timbre frequencies by the duration of the song
normalized every song to frequency per second but it also had the undesired effect
of making the data too small Timbre and pitch frequencies per second were almost
always less than 10 and many times hovered as low as 0002 for nonzero values
Because all of the values were very close to each other using common values of α
in the range of 01 to 1000-2000 was insufficient to push the songs into different
clusters As a result every song fell into the same cluster Increasing the value
of α by several orders of magnitude to well over 10 million fixed the problem but
this solution presented two problems First tuning α to experiment with different
ways to cluster the music would be problematic since I would have to work with
an enormous range of possible values for α Second pushing α to such high values
is not appropriate for the Dirichlet Process Extremely high values of α indicate a
Dirichlet Process that will try to disperse the data into different clusters but a value
of α that high is in principle always assigning each new song to a new cluster On
the other hand varying α between 01 and 1000 for example presents a much wider
range of flexibility when assigning clusters While this may be possible by varying
the values of α an extreme amount with the data as it currently is we are using
the Dirichlet Process in a way it should mathematically not be used Therefore
multiplying all of the data by a constant value so that we can work in the appropriate
range of α is the ideal approach After some experimentation I found that k=10 was
an appropriate scaling factor After initial runs of the Dirichlet Process I found out
that there was a slight issue with some of the earlier songs Since I had only artist
genre tags not specific song tags for each song I chose songs based on whether any
of the tags associated with the artist fell under any electronic music genre including
the generic term rsquoelectronicrsquo There were some bands mostly older ones from the
1960s and 1970s like Electric Light Orchestra which had some electronic music but
28
mostly featured rock funk disco or another genre Given that these artists featured
mostly non-electronic songs I decided to exclude them from my study and generate
a blacklist indicating these music artists While it was infeasible to look through
every single song and determine whether it was electronic or not I was able to look
over the earliest songs in each cluster These songs were the most important to verify
as electronic because early non-electronic songs could end up forming new clusters
and inadvertently create clusters with non-electronic sounds that I was not looking for
The goal of this thesis is to identify different groups in which EM songs are
clustered and identify the most unique artists and genres While the second task is
very simple because it requires looking at the earliest songs in each cluster the first
is difficult to gauge the effectiveness of While I can look at the average chord change
and timbre category frequencies in each category as well as other metadata putting
more semantic interpretations to what the music actually sounds like and determining
whether the music is clustered properly is a very subjective process For this reason
I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and
compared the clustering in each category examining similarities and differences in
the clusters formed in each scenario in the Discussion section For each value α I
set the upper limit of components or clusters allowed to 50 The ranges of α I used
resulted in 9 14 and 19 clusters formed
32 Findings
321 α=005
When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are
the distribution of years of the songs in each cluster (note that the Dirichlet Process
29
does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are
skipped)
30
Figure 31 Song year distributions for α = 005
For each value of α I also calculated the average frequency of each chord change
category and timbre category for each cluster and plotted the results The green
lines correspond to timbre and the blue lines to pitch
31
Figure 32 Timbre and pitch distributions for α = 005
A table of each cluster formed the number of songs in that cluster and descriptions
of pitch timbre and rhythmic qualities characteristic of songs in that cluster are
shown below
32
Cluster Song Count Characteristic Sounds
0 6481 Minimalist industrial space sounds dissonant chords
1 5482 Soft New Age ethereal
2 2405 Defined sounds electronic and non-electronic instru-
ments played in standard rock rhythms
3 360 Very dense and complex synths slightly darker tone
4 4550 Heavily distorted rock and synthesizer
6 2854 Faster paced 80s synth rock acid house
8 798 Aggressive beats dense house music
9 1464 Ambient house trancelike strong beats mysterious
tone
11 1597 Melancholy tones New wave rock in 80s then starting
in 90s downtempo trip-hop nu-metal
Table 31 Song cluster descriptions for α = 005
322 α=01
A total of 14 clusters were formed (16 were formed but 2 clusters contained only
one song each I listened to both of these songs and they did not sound unique so
I discarded them from the clusters) Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
33
34
Figure 33 Song year distributions for α = 01
35
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
Abstract
This thesis provides a foundation of code and models to mathematically analyze
the evolution of Electronic Music (EM) over time Using chronologically ordered
data from the Million Song Dataset it utilizes a Dirichlet Process Gaussian Mixture
Model to assign the songs to clusters based on pitch and timbre data and without
any previous assumptions of the clusters beforehand By examining the characteristic
sounds of songs in each cluster the following conclusions are reached
1 Which artists and songs were most innovative for their time
2 Potential new ways in which the genealogy of and relations between EM genres
can be imagined
Finally this thesis evaluates the strengths and weaknesses of the model used and
suggests future work that can be done to improve upon it
iii
Acknowledgements
I would like to thank Professor Ramon Van Handel for advising me on my thesis
You helped me figure out how to narrow down my goals into a concrete topic and
provided useful input on how to model and frame my problem effectively I would
also like to thank the Princeton ORFE department for providing funding to download
and manage the dataset I used for this project Michael Bino and the Computational
Science and Engineering Support group (CSES) were incredibly useful for helping me
set up and run my programs on the Princeton servers Without your help I would
have had a much harder time getting my 300GB dataset of music to play nice I
would also like to thank Jeffrey Scott Dwoskin for providing the Latex template from
which I wrote this thesis And finally I would like to thank my family and friends
especially Lucas and Kathryn for providing continuous support and feedback The
work we all poured into our theses is incredible and wersquove made it through this
sometimes rocky journey in the greatest university of all
On a personal note regardless of whether you are a current Princeton under-
graduate or are just interested in my work push yourself beyond your comfort zone
and donrsquot let grades or other peoplersquos opinions get in your way Take classes and
join new groups that reflect your passions At the same time love yourself Take
care of your body and have some fun without feeling guilty And finally form great
relationships While Princetonians sometimes appear hypercompetitive and forced
they are genuinely sweet and brilliant people who you will treasure for life These
four years at Princeton have gone by in a flash and in the whirlwind of highs and
lows Irsquove gone through these are the most important lessons Irsquove learned
iv
To my parents
v
Contents
Abstract iii
Acknowledgements iv
List of Tables viii
List of Figures ix
1 Introduction 1
11 Background Information 1
12 Literature Review 3
13 The Dataset 10
2 Mathematical Modeling 12
21 Determining Novelty of Songs 12
22 Feature Selection 14
23 Collecting Data and Preprocessing Selected Features 20
231 Collecting the Data 20
232 Pitch Preprocessing 21
233 Timbre Preprocessing 25
3 Results 27
31 Methodology 27
32 Findings 29
321 α=005 29
vi
322 α=01 33
323 α=02 38
33 Analysis 46
4 Conclusion 53
41 Design Flaws in Experiment 53
42 Future Work 55
43 Closing Remarks 56
A Code 57
A1 Pulling Data from the Million Song Dataset 57
A2 Calculating Most Likely Chords and Timbre Categories 58
A3 Code to Compute Timbre Categories 60
A4 Helper Methods for Calculations 61
Bibliography 68
vii
List of Tables
31 Song cluster descriptions for α = 005 33
32 Song cluster descriptions for α = 01 38
33 Song cluster descriptions for α = 02 45
viii
List of Figures
11 A userrsquos taste profile generated by Spotify 4
12 Data processing pipeline for Mauchrsquos study illustrated with a segment
of Queenrsquos Bohemian Rhapsody 1975 8
21 scikit-learn example of GMM vs DPGMM and tuning of α 15
22 Number of Electronic Music Songs in Million Song Dataset from Each
Year 26
31 Song year distributions for α = 005 31
32 Timbre and pitch distributions for α = 005 32
33 Song year distributions for α = 01 35
34 Timbre and pitch distributions for α = 01 37
35 Song year distributions for α = 02 41
36 Timbre and pitch distributions for α = 02 44
ix
Chapter 1
Introduction
11 Background Information
Electronic Music (EM) is an increasingly popular genre of music with an immense
presence and influence on modern culture Because the genre is new as a whole and
is arguably more loosely structured than other genres - technology has enabled the
creation of a wide range of sounds and easy blending of existing and new sounds alike
- formal analysis especially mathematical analysis on the genre is fairly limited and
has only begun growing in the past few years As a fan of EM I am interested in
exploring how the genre has evolved over time More specifically my goal with this
project was to design some structure or model that could help me identify which EM
artists have contributed the most stylistically to the genre Oftentimes famous EM
artists do not create novel-sounding music but rather popularize an existing style
and the motivation of this study is to understand who has stylistically contributed
the most to the EM scene versus those who have merely popularized aspects of it
As the study progressed the manner in which I constructed my model lent to
a second goal of the thesis imagining new ways in which we can imagine EM genres
1
While there exists an extensive amount of research analyzing music trends from
a non-mathematical (cultural societal artistic) perspective the analysis of EM
from a mathematical perspective and especially with respect to any computationally
measurable trends in the genre is close to nonexistent EM has been analyzed to a
lesser extent than other common genres of music in the academic world most likely
due to existing for a shorter amount of time and being less rooted in prominant
social and cultural events In fact the first published reference work on EM did not
exist until 2012 when Professor Mark J Butler from Northwestern University edited
and published Electronica Dance and Club Music a collection of essays exploring
EM genres and culture [1] Furthermore there are very few comprehensive visual
guides that allow a user to relate every genre to each other and easily observe how
different genres converge and diverge While conducting research the best guide I
found was not a scholarly source but an online guide created by an EM enthusiast
Ishkurrsquos Guide to Electronic Music [2] This guide which includes over 100 specific
genres grouped by more general genres and represents chronological evolutions by
connecting each genre in a flowchart is the most exhaustive analysis of the EM scene
I could find However the guidersquos analysis is very qualitative While each subgenre
contains an explanation on typical rhythm and sounds and includes well-known
songs indicative of the style the guide is created by someone who used historical and
personal knowledge of EM My model which creates music genres by chronologically
ordering songs and then assigning them to clusters is a different approach towards
imagining the entire landscape of EM The results may confirm Ishkurrsquos Guidersquos
findings in which case his guide is given additional merit with mathematical evi-
dence or it may be different suggesting that there may be better ways to group EM
genres One advantage that guides such as Ishkurrsquos and historically-based scholarly
works have over my approach is that those models are history-sensitive and therefore
may group songs in a way that historically makes sense On the other hand my
2
model is history-agnostic and may not realize the historical context of songs when
clustering However I believe that there is still significant merit to my research
Instead of classifying genres of music by early genres that led to them my approach
gives the most credit to the artists and songs that were the most innovative for their
time and perhaps reveal different musical styles that are more similar to each other
than history would otherwise imply This way of thinking of music genres while
unconventional is another way of imagining EM
The practice of quantitatively analyzing music has exploded in the last decade
thanks to technological and algorithmic advances that allow data scientists to con-
structively sift through troves of music and listener information In the literature
review I will focus on two particular organizations that have contributed greatly to
the large-scale mathematical analysis of music Pandora a website that plays songs
similar to a songartistalbum inputted by a user and Echo Nest a music analytics
firm that was acquired by Spotify in 2014 and drives Spotifyrsquos Discover Weekly
feature [3] After evaluating the relevance of these sources to my thesis work I will
then look over the relevant academic research and evaluate what this research can
contribute
12 Literature Review
The analysis of quantitative music generally falls into two categories research con-
ducted by academics and academic organizations for scholarly purposes and research
conducted by companies and primarily targeted for consumers First looking at the
consumer-based research Spotify and Pandora are two of the most prominent based
groups and the two I decided to focus on Spotify is a music streaming service where
users can listen to albums and songs from a wide variety of artists or listen to weekly
3
playlists generated based on the music the user and userrsquos friends have listened to
The weekly playlist called Discover Weekly Playlist is a relatively new feature in
Spotify and is driven by music analysis algorithms created from Echo Nest Using
the Echo Nest code interface Spotify creates a ldquotaste profilerdquo for each user which
assesses attributes such as how often a user branches out to new styles of music how
closely the userrsquos music streamed follows popular Billboard music charts and so on
Spotify also looks at the artists and songs the user streamed and creates clusters
of different genres that the user likes (see figure 11) The taste profile and music
clusters can then be used to generate playlists geared to a specific user The genres
in the cluster come from a list of nearly 800 names which are derived by scraping
the Internet for trending terms in music as well as training various algorithms on a
regular basic by ldquolisteningrdquo to new songs [4][5]
Figure 11 A userrsquos taste profile generated by Spotify
4
Although Spotify and Echo Nestrsquos algorithms are very useful for mapping the land-
scape of established and emerging genres of music the methodology is limited to
pre-defined genres of music This may serve as a good starting point to compare my
final results to but my study aims to be as context-free as possible by attaching no
preconceived notions of music styles or genres instead looking at features that could
be measured in every song
While Spotifyrsquos approach to mapping music is very high-tech and based on ex-
isting genres Pandora takes a very low-tech and context-free approach to music
clustering Pandora created the Music Genome Project a multi-year undertaking
where skilled music theorists listened to a large number of songs and analyzed up to
450 characteristics in each song [6] Pandorarsquos approach is appealing to the aim of
my study since it does not take any preconceived notions of what a genre of music
is instead comparing songs on common characteristics such as pitch rhythm and
instrument patterns Unfortunately I do not have a cadre of skilled music theorists
at my disposal nor do I have 10 years to perform such calculations like the dedicated
workers at Pandora (tips the indestructible fedora) Additionally Pandorarsquos Music
Genome Project is intellectual property so at best I can only rely on the abstract
concepts of the Music Genome Project to drive my study
In the academic realm there are no existing studies analyzing quantifiable changes in
EM specifically but there exist a few studies that perform such analysis on popular
Western music in general One such study is Measuring the Evolution of Contem-
porary Western Popular Music which analyzes music from 1955-2010 spanning all
common genres Using the Million Song Dataset a free public database of songs
each containing metadata (see section 13) the study focuses on the attributes pitch
timbre and loudness Pitch is defined as the standard musical notes or frequency of
5
the sound waves Timbre is formally defined as the Mel frequency cepstral coefficients
(MFCC) of a transformed sound signal More informally it refers to the sound color
texture or tone quality and is associated with instrument types recording resources
and production techniques In other words two sounds that have the same pitch
but different tones (for example a bell and voice) are differentiated by their timbres
There are 12 MFCCs that define the timbre of a given sound Finally loudness
refers to intrinsically how loud the music sounds not loudness that a listener can
manipulate while listening to the music Loudness is the first MFCC of the timbre
of a sound [7] The study concluded that over time music has been becoming louder
and less diverse
The restriction of pitch sequences (with metrics showing less variety inpitch progressions) the homogenization of the timbral palette (with fre-quent timbres becoming more frequent) and growing average loudnesslevels (threatening a dynamic richness that has been conserved until to-day) This suggests that our perception of the new would be essentiallyrooted on identifying simpler pitch sequences fashionable timbral mix-tures and louder volumes Hence an old tune with slightly simpler chordprogressions new instrument sonorities that were in agreement with cur-rent tendencies and recorded with modern techniques that allowed forincreased loudness levels could be easily perceived as novel fashionableand groundbreaking
This study serves as a good starting point for mathematically analyzing music in
a few ways First it utilizes the Million Song Dataset which addresses the issue
of legally obtaining music metadata As mentioned in section 13 the only legal
way to obtain playable music for this study would have been to purchase all songs I
would include which is infeasible While the Million Song Dataset does not contain
the audio files in playable format it does contain audio features and metadata that
allow for in-depth analysis In addition working with the dataset takes out the
work of extracting features from raw audio files saving an extensive amount of time
and energy Second the study establishes specifics for what constitutes a trend
in music Pitch timbre and loudness are core features of music and examining the6
distributions of each among songs over time reveals a lot of information about how
the music industry and consumersrsquo tastes have evolved While these are not all of the
features contained in a song they serve as a good starting point Third the study
defines mathematical ways to capture music attributes and measure their change
over time For example pitches are transposed into the same tonal context with
binary discretized pitch descriptions based on a threshold so that each song can be
represented with vectors of pitches that are normalized and compared to other songs
While this study lays some solid groundwork for capturing and analyzing nu-
meric qualities of music it falls short of addressing my goals in a couple of ways
First it does not perform any analysis with respect to music genre While the
analysis performed in this paper could easily be applied to a list of songs in a specific
genre certain genres might have unique sounds and rhythms relative to other genres
that would be worth studying in greater detail Second the study only measures
general trends in music over time The models used to describe changes are simple
regressions that donrsquot look at more nuanced changes For example what styles of
music developed over certain periods of time How rapid were those changes Which
styles of music developed from which other styles
A more promising study led by music researcher Matthias Mauch [8] analyzes
contemporary popular Western Music from the 1960s to 2010s by comparing numer-
ical data on the pitches and timbre of a corpus of 17000 songs that appeared on the
Billboard Hot 100 Like the previously mentioned paper Measuring the Evolution
of Contemporary Western Popular Music Mauchrsquos study also creates abstractions
of pitch and timbre in order to provide a consistent and meaningful semantic inter-
pretation of musical data (see figure 12) However Mauchrsquos study takes this idea a
step further by using genre tags from Lastfm a music website and constructing a
7
hierarchy of music genres using hierarchical clustering Additionally the study takes
a crack at determining whether a particular band the Beatles was musically ground-
breaking for its time or merely playing off sounds that other bands had already used
Figure 12 Data processing pipeline for Mauchrsquos study illustrated with a segment ofQueenrsquos Bohemian Rhapsody 1975
While both Measuring the Evolution of Contemporary Western Popular Music
and Mauchrsquos study created abstractions of pitch and timbre Mauchrsquos study is more
appealing with respect to my goal because its end results align more closely with
mine Additionally the data processing pipeline offers several layers of abstraction
8
and depending on my progress I would be able to achieve at least one of the levels of
abstraction As shown in figure 12 each segment of a raw audio file is first broken
down into its 12 timbre MFCCs and pitch components Next the study constructs
ldquolexiconsrdquo or a dictionary of pitch and timbre terms that all songs can be compared
to For pitch the original data is in a N-by-12 matrix where N is the number of time
segments in the song and 12 the number of each of the notes found in an octave of
pitches Each time segment contains the relative strengths of each of the 12 pitches
However music sounds are not merely a collection of pitches but more precisely
chords Furthermore the similarity of two songs is not determined by the absolute
pitches of their chords but rather the progression of chords in the song all relative to
each other For example if all the notes in a song are transposed by one step the song
will sound different in terms of absolute pitch but the song will still be recognized
as the original because all of the relative movements from each chord to the next
are the same This phenomenon is captured in the pitch data by finding the most
likely chord played at each time segment then counting the change to the next chord
at each time step and generating a table of chord change frequencies for each song
Constructing the timbre lexcion is more complicated since there is no easy analogue
like chords for pitches to compare songs Mauchrsquos study utilizes a Gaussian Mixture
Model (GMM) by iterating over k=1 to k=N clusters where N is a large number
running the GMM on each prior assumption of k clusters and computing the Bayes
Information Criterion (BIC) for each model The lowest of the N BIC values is found
and that value of k is selected That model contains k different timbre clusters
and each cluster contains the mean timbre value for each of the 12 timbre components
For my research I decided that the pitch and timbre lexicons would be the most
realistic level of abstraction I could obtain Mauchrsquos study adds an addtional layer
to pitch and timbre by identifying the most common patterns of chord changes and
9
most common timbre rhythms and creating more general tags from these combined
terms such as ldquo stepwise changes indicating modal harmonyrdquo for a pitch topic and
ldquooh rounded mellowrdquo for a timbral topic There were two problems with using this
final layer of abstraction for my study First attaching semantic interpretations to
the pitch and timbral lexicons is a difficult task For timbre I would need to listen
to sound samples containing all of the different timbral categories I identified and
attaching user interpretations to them For the chords not only would I have to
perform the same analysis as on timbre but take careful attention to identify which
chords correspond to common sound progressions in popular music a task that I am
not qualified for an did not have the resources for this thesis to seek out Second
this final layer of abstraction was not necessary for the end goal of my paper In
fact consolidating my pitch and timbre lexicons into simpler phrases would run the
risk of pigeonholing my analysis and preventing me from discovering more nuanced
patterns in my final results Therefore I decided to focus on pitch and timbral
lexicon construction as the furthest levels of abstraction when processing songs for
my thesis Mathematical details on how I constructed the lexical and timbral lexicons
can be found in the Mathematical Modeling section of this paper
13 The Dataset
In order to successfully execute my thesis I need access to an extensive database of
music Until recently acquiring a substantial corpus of music data was a difficult and
costly task It is illegal to download music audio files from video and music-sharing
sites such as YouTube Spotify and Pandora Some platforms such as iTunes offer
90-second previews of songs but using only segments of songs and usually segments
that showcase the chorus of the song are not reliable measures to capture the entire
essence of a song Even if I were to legally download entire audio files for free I would
10
run into additional issues Obtaining a high-quality corpora of song data would be
challenging writing scripts that crawl music sharing platforms may not capture all of
the music I am looking for And once I have the audio files I would have to perform
audio processing techniques to extract the relevant information from the songs
Fortunately there is an easy solution to the music data acquisition problem
The Million Song Dataset (MSD) is a collection of metadata for one million music
tracks dating up to 2011 Various organizations such as The Echo Nest Musicbrainz
7digital and Lastfm have contributed different pieces of metadata Each song is
represented as a Hierarchical Data Format file (HDF5) which can be loaded as a
JSON object The fields encompass topical features such as the song title artist
and release date as well as lower-level features such as the loudness starting beat
time pitches and timbre of several segments of the song [9] While the MSD is
the largest free and open source music metadata dataset I could find there is no
guarantee that it adequately covers the entire spectrum of EM artists and songs
This quality limitation is important to consider throughout the study A quick look
through the songs including the subset of data I worked with for this report showed
that there were several well-known artists and songs in the EM scene Therefore
while the MSD may not contain all desired songs for this project it contains an
adequate number of relevant songs to produce some meaningful results Additionally
laying the groundwork for modeling the similarities between songs and identifying
groundbreaking ones is the same regardless of the songs included and the following
methodologies can be implemented on any similarly-formatted dataset including one
with songs that might currently be missing in the MSD
11
Chapter 2
Mathematical Modeling
21 Determining Novelty of Songs
Finding an logical and implementable mathematical model was and continues to be
an important aspect of my research My problem how to mathematically determine
which songs were unique for their time requires an algorithm in which each song is
introduced in chronological order either joining an existing category or starting a
new category based on its musical similarity to songs already introduced Clustering
algorithms like k-means or Gaussian Mixture Models (GMM) which have a prede-
termined number of clusters and optimize the partitioning of a dataset into those
cluster assume a fixed number of clusters While this process would work if we knew
exactly how many genres of EM existed if we guess wrong our end results may end
up with clusters that are wrongly grouped together or separated It is much better to
apply a clustering algorithm that does not make any assumptions about this number
One particularly promising process that addresses the issue of tthe number of
clusters is a family of algorithms known as Dirichlet Proccesses (DPs) DPs are
useful for this particular application because (1) they assign clusters to a dataset
12
with only an upper bound on the number of clusters and (2) by sorting the songs
in chronological order before running the algorithm and keeping track of which
songs are categorized under each cluster we can observe the earliest songs in each
cluster and consequentially infer which songs were responsible for creating new
clusters The arguments for the DP The DP is controlled by a parameter α which
is the concentration parameter The expected number of clusters formed is directly
proportional to the value of α so the higher the value of α the more likely new
clusters will be formed [10] Regardless of the value of α as the number of data
points introduced increases the probability of a new group being formed decreases
That is a ldquorich get richerrdquo policy is in place and existing clusters tend to grow in
size Tweaking the value of the tunable parameter α is an important part of the
study since it determines the flexibility given to forming a new cluster If the value
of α is too small then the criteria for forming clusters will be too strict and data
that should be in different clusters will be assigned to the same cluster On the other
hand if α is too large the algorithm will be too sensitive and assign similar songs to
different clusters
The implementation of the DP was achieved using scikit-learnrsquos library and API for
Dirichlet Process Gaussian Mixture Model (DPGMM) The DPGMM is the formal
name of the Dirichlet Process model used to cluster the data More specifically
scikit-learnrsquos implementation of the DPGMM uses the Stick Breaking method
one of several equally valid methods to assign songs to clusters [11] While the
mathematical details for this algorithm can be found at the following citation [12]
the most important aspects of the DPGMM are the arguments that the user can
specify and tune The first of these tunable parameters is the value α which is the
same parameter as the α discussed in the previous paragraph As seen in Figure 21
on the right side properly tuning α is key to obtaining meaningful clusters The
13
center image has α set to 001 which is too small and results in all of the data being
formed under one cluster On the other hand the bottom-right image has the same
data set and α set to 100 which does a better job of clustering On a related note
the figure also demonstrates the effectiveness of the DPGMM over the GMM On the
left side clearly the dataset contains 2 clusters but the GMM on the top-left image
assumes 5 clusters as a prior and consequentially clusters the data incorrectly while
the DPGMM manages to limit the data to 2 clusters
The second argument that the user inputs for the DPGMM is the data that
will be clustered The scikit-learn implementation takes the data in the format
of a nested list (N lists each of length m) where N is the number of data points
and m the number of features While the format of the data structure is relatively
straightforward choosing which numbers should be in the data was a challenge I
faced Selecting the relevant features of each song to be used in the algorithm will
be expounded upon in the next section ldquoFeature Selectionrdquo
The last argument that a user inputs for the scikit-learn DGPMM implementa-
tion is an argument indicating the upper bound for the number of clusters The
Dirichlet Process then determines the best number of clusters for the data between
1 and the upper bound Since the DPGMM is flexible enough to find the best value
I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to
modify the number of clusters formed
22 Feature Selection
One of the most difficult aspects of the Dirichlet method is choosing the features
to be used for clustering In other words when we organize the songs into clusters
14
Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α
we need to ensure that each cluster is distinct in a way that is statistically and
intuitively logical In the Million Song Dataset [9] each song is represented as a
JSON object containing several fields These fields are candidate features to be used
in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo
by Rick Astley and the corresponding features
artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID
for this artists is db9)
artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)
artist_mbtags_count shape = (4)
(raw tag count of the 4 tags this artist received on musicbrainzorg)
artist_name Rick Astley (artist name)
artist_playmeid 1338 (the ID of that artist on the service playmecom)
artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)
artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest
(number between 0 and 1))
artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest
(number between 0 and 1))
15
audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for
the analysis by The Echo Nest)
bars_confidence shape = (99) (confidence value (between 0 and 1) associated
with each bar by The Echo Nest)
bars_start shape = (99) (start time of each bar according to The Echo Nest this
song has 99 bars)
beats_confidence shape = (397))
confidence value (between 0 and 1) associated with each beat by The Echo Nest
beats_start shape = (397) (start time of each beat according to The Echo Nest
this song has 397 beats)
danceability 00 (danceability measure of this song according to The Echo Nest
(between 0 and 1 0 =gt not analyzed))
duration 21169587 (duration of the track in seconds)
end_of_fade_in 0139 (time of the end of the fade in at the beginning of the
song according to The Echo Nest)
energy 00 (energy measure (not in the signal processing sense) according to The
Echo Nest (between 0 and 1 0 = not analyzed))
key 1 (estimation of the key the song is in by The Echo Nest)
key_confidence 0324 (confidence of the key estimation)
loudness -775 (general loudness of the track)
mode 1 (estimation of the mode the song is in by The Echo Nest)
mode_confidence 0434 (confidence of the mode estimation)
release Big Tunes - Back 2 The 80s (album name from which the track was taken
some songs tracks can come from many albums we give only one)
release_7digitalid 786795 (the ID of the release (album) on the service 7digi-
talcom)
sections_confidence shape = (10) (confidence value (between 0 and 1) associated
16
with each section by The Echo Nest)
sections_start shape = (10) (start time of each section according to The Echo
Nest this song has 10 sections)
segments_confidence shape = (935) (confidence value (between 0 and 1) asso-
ciated with each segment by The Echo Nest)
segments_loudness_max shape = (935) (max loudness during each segment)
segments_loudness_max_time shape = (935) (time of the max loudness
during each segment)
segments_loudness_start shape = (935) (loudness at the beginning of each
segment)
segments_pitches shape = (935 12) (chroma features for each segment (normal-
ized so max is 1))
segments_start shape = (935) (start time of each segment ( musical event or
onset) according to The Echo Nest this song has 935 segments)
segments_timbre shape = (935 12) (MFCC-like features for each segment)
similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar
to Rick Astley according to The Echo Nest)
song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded
(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))
song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can
be associated with many tracks (with very slight audio differences))
start_of _fade _out 198536 (start time of the fade out in seconds at the end
of the song according to The Echo Nest)
tatums_confidence shape = (794) (confidence value (between 0 and 1) associated
with each tatum by The Echo Nest)
tatums_start shape = (794) (start time of each tatum according to The Echo
Nest this song has 794 tatums)
17
tempo 113359 (tempo in BPM according to The Echo Nest)
time_signature 4 (time signature of the song according to The Echo Nest ie
usual number of beats per bar)
time_signature_confidence 0634 (confidence of the time signature estimation)
title Never Gonna Give You Up (song title)
track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)
track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track
on which the analysis was done) year 1987 (year when this song was released
according to musicbrainzorg)
When choosing features my main goal was to use features that would most
likely yield meaningful results yet also be simple and make sense to the average
person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener
will have his or her opinions to what constitutes different types of music but some
common features most people tend to differentiate songs by are pitch rhythm and
the types of instruments used The following specific fields provided in each song
object fall under these three terms
Pitch
bull segments_pitches a matrix of values indicating the strength of each pitch (or
note) at each discernible time interval
Rhythm
bull beats_start a vector of values indicating the start time of each beat
bull time_signature the time signature of the song
bull tempo the speed of the song in Beats Per Minute (BPM)
Instruments
18
bull segments_timbre a matrix of values indicating the distribution of MFCC-like
features (different types of tones) for each segments
The segments_pitches feature is a clear candidate for a differentiating factor for
songs since it reveals patterns of notes that occur Additionally other research
papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a
procedure that allows all songs to be compared with the same metric Likewise
timbre is intuitively a reliable differentiating feature since it reveals the amount
that different tones or sounds that sound different despite having the same pitch
Therefore segments_timbre is another feature that is considered in each song
Finally we look at the candidate features for rhythm At first glance all of these
features appear to be useful as they indicate the rhythm of a song in one way or
another However none of these features are as useful as the pitch and timbre
features While tempo is one factor in differentiating genres of EDM and music in
general tempo alone is not a driving force of musical innovation Certain genres
of EDM like drum nrsquo bass and happycore stand out for having very fast tempos
but the tempo is supplemented with a sound unique to the genre Conceiving new
arrangements of pitches combining instruments in new ways and inventing new
types of sounds are novel but speeding up or slowing down existing sounds is not
Including tempo as a feature could actually add noise to the model since many genres
overlap in their tempos And finally tempo is measured indirectly when the pitch
and timbre features are normalized for each song everything is measured in units of
ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features
each second Time signature can be dismissed from the candidates features for the
same reason as tempo many genres contain the same time signature and including
it in the feature set would only add more noise beats_start looks like a more
promising feature since like segments_pitches and segments_timbre it consists of
a vector of values However difficulties arise when we begin to think how exactly
19
we can utilize this information Since each song varies in length we need a way to
compare songs of different durations on the same level One approach could be to
perform basic statistics on the distance between each beat for example calculating
the mean and standard deviation of this distance However the normalized pitch
and timbre information already capture this data Another possibility is detecting
certain patterns of beats which could differentiate the syncopated dubstep or glitch
music beats from the steady pulse of electro-house But once again every beat is
accompanied by a sound with a specific timbre and pitch so this feature would not
add any significantly new information
23 Collecting Data and Preprocessing Selected Fea-
tures
231 Collecting the Data
Upon deciding the features I wanted to use in my research I first needed to collect
all of the electronic songs in the Million Song Dataset The easiest reliable way to
achieve this was to iterate through each song in the database and save the information
for the songs where any of the artist genre tags in artist_mbtags matched with an
electronic music genre While this measure was not fully accurate because it looks at
the genre of the artist not the song specific genre information for each song was not
as easily accessible so this indicator was nearly as good a substitute To generate a
list of the genres that electronic songs would fall under I manually searched through
a subset of the MSD to find all genres that seemed to be releated to electronic music
In the case of genres that were sometimes but not always electronic in nature such
20
as disco or pop I erred on the side of caution and did not include them in the list
of electronic genres In these cases false positives such as primarily rock songs that
happen to have the disco label attached to the artist could inadvertantly be included
in the dataset The final list of genres is as follows
target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo
rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo
rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo
rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
rsquodance and electronicarsquorsquoelectronicrsquo]
232 Pitch Preprocessing
A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-
cally informed manner The study first takes the raw sound data and converts it into
a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest
amount Then it computes the most likely chord by comparing comparing the 4 most
common types of chords in popular music (major minor dominant 7 and minor 7) to
the observed chord The most common chords are represented as ldquotemplate chordsrdquo
and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For
example using the note C as the first index the C major chord is represented as
CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)
For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is
computed over every template chord
ρCTc =12sumi=1
(CT minus CTi)(ci minus c)σCTσc
21
where CT is the mean of the values in the template chord σCT is the standard
deviation of the values in the chord and the operations on c are analogous Note
that the summation is over each individual pitch in the 12 pitch classes The chord
template with the highest value of ρ is selected as the chord for the time frame
After this is performed for each time frame the values are smoothed and then the
change between adjacent chords is observed The reasoning behind this step is that
by measuring the relative distance between chords rather than the chords themselves
all songs can be compared in the same manner even though they may have different
key signatures Finally the study takes the types of chord changes and classifies
them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted
versions of the chord changes that make more sense to a human such as ldquochanges
involving dominant 7th chordsrdquo
In my preliminary implementation of this method on an electronic dance music
corpus I made a few modifications to Mauchrsquos study First I smoothed out time
frames before computing the most probable chords rather than smoothing the most
probable chords I did this to save time and to reduce volatility in the chord
measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference
which contains 935 time frames and lasts 212 seconds 5 time frames is slightly
under 1 second and for preliminary testing appeared to be a good interval for each
time block Second as mentioned in the literature section I did not abstract the
chord changes into H-topics This decision also stemmed from time constraints since
deriving semantic chord meaning from EDM songs would require careful research
into the types of harmonies and sounds common in that genre of music Below I
included a high-level visualization of the pitch metadata found in a sample song
ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change
vector that I could then feed into the Dirichlet Process algorithm
22
Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813
CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513
DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913
FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713
GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213
AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513
13 13 13 13
060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013
13 13 13 13 13 13 13 13 13 13 13 13
13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13
Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13
Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13
Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13
Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13
23
13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13
13
13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13
Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13
24
A final step I took to normalize the chord change data was to divide the numbers by
the length of the song so that each songrsquos number of chord changes was measured per
second
233 Timbre Preprocessing
For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre
uniformly across all songs [8] After collecting all song metadata I took a random
sample of 20 songs from each year starting at 1970 The reason I forced the sampling
to 20 randomly sampled songs from each year and did not take a random sample of
songs from all years at once was to prevent bias towards any type of sounds As seen
in figure 22 there are significantly more songs from 2000-2011 than before 2000 The
mean year is x = 2001052 the median year is 2003 and the standard deviation of the
years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include
a disproportionate amount of more recent songs In order to not miss out on sounds
that may be more prevalent in older songs I required a set number of songs from each
year Next from each randomly selected song I selected 20 random timbre frames
in order to prevent any biases in data collection within each song In total there
were 422020 = 16800 timbre frames collected Next I clustered the timbre frames
using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to
100 and selecting the number of clusters with the lowest Bayes Information Criterion
(BIC) a statistical measure commonly used to calculate the best fitting model The
BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters
and saved the mean values of each of the 12 timbre segments for each cluster formed
In the same way that every song had the same 192 cord changes whose frequencies
could be compared between songs each song now had the same 46 timbre clusters
but different frequencies in each song When reading in the metadata from each song
I calculated the most likely timbre cluster each timbre frame belonged to and kept
25
Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear
a frequency count of all of the possible timbre clusters observed in a song Finally
as with the pitch data I divided all observed counts by the duration of the song in
order to normalize each songrsquos timbre counts
26
Chapter 3
Results
31 Methodology
After the pitch and timbre data was processed I ran the Dirichlet Process on the
data For each song I concatenated the 192-element chord change frequency list
and the 46-element timbre category frequency list giving each song a total of 238
features However there is a problem with this setup The pitch data will inherently
dominate the clustering process since it contains almost 3 times as many features
as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to
give different weights to each feature I considered another possibility to remedy
this discrepancy duplicating the timbre vector a certain number of times and
concatenating that to the feature set of each song While this strategy runs the risk
of corrupting the feature set and turning it into something that does not accurately
represent each song it is important to keep in mind that even without duplicating
the timbre vector the feature set consists of two separate feature sets concatenated
to each other Therefore timbre duplication appears to be a reasonable strategy to
weigh pitch and timbre more evenly
27
After this modification I tweaked a few more parameters before obtaining my
final results Dividing the pitch and timbre frequencies by the duration of the song
normalized every song to frequency per second but it also had the undesired effect
of making the data too small Timbre and pitch frequencies per second were almost
always less than 10 and many times hovered as low as 0002 for nonzero values
Because all of the values were very close to each other using common values of α
in the range of 01 to 1000-2000 was insufficient to push the songs into different
clusters As a result every song fell into the same cluster Increasing the value
of α by several orders of magnitude to well over 10 million fixed the problem but
this solution presented two problems First tuning α to experiment with different
ways to cluster the music would be problematic since I would have to work with
an enormous range of possible values for α Second pushing α to such high values
is not appropriate for the Dirichlet Process Extremely high values of α indicate a
Dirichlet Process that will try to disperse the data into different clusters but a value
of α that high is in principle always assigning each new song to a new cluster On
the other hand varying α between 01 and 1000 for example presents a much wider
range of flexibility when assigning clusters While this may be possible by varying
the values of α an extreme amount with the data as it currently is we are using
the Dirichlet Process in a way it should mathematically not be used Therefore
multiplying all of the data by a constant value so that we can work in the appropriate
range of α is the ideal approach After some experimentation I found that k=10 was
an appropriate scaling factor After initial runs of the Dirichlet Process I found out
that there was a slight issue with some of the earlier songs Since I had only artist
genre tags not specific song tags for each song I chose songs based on whether any
of the tags associated with the artist fell under any electronic music genre including
the generic term rsquoelectronicrsquo There were some bands mostly older ones from the
1960s and 1970s like Electric Light Orchestra which had some electronic music but
28
mostly featured rock funk disco or another genre Given that these artists featured
mostly non-electronic songs I decided to exclude them from my study and generate
a blacklist indicating these music artists While it was infeasible to look through
every single song and determine whether it was electronic or not I was able to look
over the earliest songs in each cluster These songs were the most important to verify
as electronic because early non-electronic songs could end up forming new clusters
and inadvertently create clusters with non-electronic sounds that I was not looking for
The goal of this thesis is to identify different groups in which EM songs are
clustered and identify the most unique artists and genres While the second task is
very simple because it requires looking at the earliest songs in each cluster the first
is difficult to gauge the effectiveness of While I can look at the average chord change
and timbre category frequencies in each category as well as other metadata putting
more semantic interpretations to what the music actually sounds like and determining
whether the music is clustered properly is a very subjective process For this reason
I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and
compared the clustering in each category examining similarities and differences in
the clusters formed in each scenario in the Discussion section For each value α I
set the upper limit of components or clusters allowed to 50 The ranges of α I used
resulted in 9 14 and 19 clusters formed
32 Findings
321 α=005
When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are
the distribution of years of the songs in each cluster (note that the Dirichlet Process
29
does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are
skipped)
30
Figure 31 Song year distributions for α = 005
For each value of α I also calculated the average frequency of each chord change
category and timbre category for each cluster and plotted the results The green
lines correspond to timbre and the blue lines to pitch
31
Figure 32 Timbre and pitch distributions for α = 005
A table of each cluster formed the number of songs in that cluster and descriptions
of pitch timbre and rhythmic qualities characteristic of songs in that cluster are
shown below
32
Cluster Song Count Characteristic Sounds
0 6481 Minimalist industrial space sounds dissonant chords
1 5482 Soft New Age ethereal
2 2405 Defined sounds electronic and non-electronic instru-
ments played in standard rock rhythms
3 360 Very dense and complex synths slightly darker tone
4 4550 Heavily distorted rock and synthesizer
6 2854 Faster paced 80s synth rock acid house
8 798 Aggressive beats dense house music
9 1464 Ambient house trancelike strong beats mysterious
tone
11 1597 Melancholy tones New wave rock in 80s then starting
in 90s downtempo trip-hop nu-metal
Table 31 Song cluster descriptions for α = 005
322 α=01
A total of 14 clusters were formed (16 were formed but 2 clusters contained only
one song each I listened to both of these songs and they did not sound unique so
I discarded them from the clusters) Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
33
34
Figure 33 Song year distributions for α = 01
35
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
Acknowledgements
I would like to thank Professor Ramon Van Handel for advising me on my thesis
You helped me figure out how to narrow down my goals into a concrete topic and
provided useful input on how to model and frame my problem effectively I would
also like to thank the Princeton ORFE department for providing funding to download
and manage the dataset I used for this project Michael Bino and the Computational
Science and Engineering Support group (CSES) were incredibly useful for helping me
set up and run my programs on the Princeton servers Without your help I would
have had a much harder time getting my 300GB dataset of music to play nice I
would also like to thank Jeffrey Scott Dwoskin for providing the Latex template from
which I wrote this thesis And finally I would like to thank my family and friends
especially Lucas and Kathryn for providing continuous support and feedback The
work we all poured into our theses is incredible and wersquove made it through this
sometimes rocky journey in the greatest university of all
On a personal note regardless of whether you are a current Princeton under-
graduate or are just interested in my work push yourself beyond your comfort zone
and donrsquot let grades or other peoplersquos opinions get in your way Take classes and
join new groups that reflect your passions At the same time love yourself Take
care of your body and have some fun without feeling guilty And finally form great
relationships While Princetonians sometimes appear hypercompetitive and forced
they are genuinely sweet and brilliant people who you will treasure for life These
four years at Princeton have gone by in a flash and in the whirlwind of highs and
lows Irsquove gone through these are the most important lessons Irsquove learned
iv
To my parents
v
Contents
Abstract iii
Acknowledgements iv
List of Tables viii
List of Figures ix
1 Introduction 1
11 Background Information 1
12 Literature Review 3
13 The Dataset 10
2 Mathematical Modeling 12
21 Determining Novelty of Songs 12
22 Feature Selection 14
23 Collecting Data and Preprocessing Selected Features 20
231 Collecting the Data 20
232 Pitch Preprocessing 21
233 Timbre Preprocessing 25
3 Results 27
31 Methodology 27
32 Findings 29
321 α=005 29
vi
322 α=01 33
323 α=02 38
33 Analysis 46
4 Conclusion 53
41 Design Flaws in Experiment 53
42 Future Work 55
43 Closing Remarks 56
A Code 57
A1 Pulling Data from the Million Song Dataset 57
A2 Calculating Most Likely Chords and Timbre Categories 58
A3 Code to Compute Timbre Categories 60
A4 Helper Methods for Calculations 61
Bibliography 68
vii
List of Tables
31 Song cluster descriptions for α = 005 33
32 Song cluster descriptions for α = 01 38
33 Song cluster descriptions for α = 02 45
viii
List of Figures
11 A userrsquos taste profile generated by Spotify 4
12 Data processing pipeline for Mauchrsquos study illustrated with a segment
of Queenrsquos Bohemian Rhapsody 1975 8
21 scikit-learn example of GMM vs DPGMM and tuning of α 15
22 Number of Electronic Music Songs in Million Song Dataset from Each
Year 26
31 Song year distributions for α = 005 31
32 Timbre and pitch distributions for α = 005 32
33 Song year distributions for α = 01 35
34 Timbre and pitch distributions for α = 01 37
35 Song year distributions for α = 02 41
36 Timbre and pitch distributions for α = 02 44
ix
Chapter 1
Introduction
11 Background Information
Electronic Music (EM) is an increasingly popular genre of music with an immense
presence and influence on modern culture Because the genre is new as a whole and
is arguably more loosely structured than other genres - technology has enabled the
creation of a wide range of sounds and easy blending of existing and new sounds alike
- formal analysis especially mathematical analysis on the genre is fairly limited and
has only begun growing in the past few years As a fan of EM I am interested in
exploring how the genre has evolved over time More specifically my goal with this
project was to design some structure or model that could help me identify which EM
artists have contributed the most stylistically to the genre Oftentimes famous EM
artists do not create novel-sounding music but rather popularize an existing style
and the motivation of this study is to understand who has stylistically contributed
the most to the EM scene versus those who have merely popularized aspects of it
As the study progressed the manner in which I constructed my model lent to
a second goal of the thesis imagining new ways in which we can imagine EM genres
1
While there exists an extensive amount of research analyzing music trends from
a non-mathematical (cultural societal artistic) perspective the analysis of EM
from a mathematical perspective and especially with respect to any computationally
measurable trends in the genre is close to nonexistent EM has been analyzed to a
lesser extent than other common genres of music in the academic world most likely
due to existing for a shorter amount of time and being less rooted in prominant
social and cultural events In fact the first published reference work on EM did not
exist until 2012 when Professor Mark J Butler from Northwestern University edited
and published Electronica Dance and Club Music a collection of essays exploring
EM genres and culture [1] Furthermore there are very few comprehensive visual
guides that allow a user to relate every genre to each other and easily observe how
different genres converge and diverge While conducting research the best guide I
found was not a scholarly source but an online guide created by an EM enthusiast
Ishkurrsquos Guide to Electronic Music [2] This guide which includes over 100 specific
genres grouped by more general genres and represents chronological evolutions by
connecting each genre in a flowchart is the most exhaustive analysis of the EM scene
I could find However the guidersquos analysis is very qualitative While each subgenre
contains an explanation on typical rhythm and sounds and includes well-known
songs indicative of the style the guide is created by someone who used historical and
personal knowledge of EM My model which creates music genres by chronologically
ordering songs and then assigning them to clusters is a different approach towards
imagining the entire landscape of EM The results may confirm Ishkurrsquos Guidersquos
findings in which case his guide is given additional merit with mathematical evi-
dence or it may be different suggesting that there may be better ways to group EM
genres One advantage that guides such as Ishkurrsquos and historically-based scholarly
works have over my approach is that those models are history-sensitive and therefore
may group songs in a way that historically makes sense On the other hand my
2
model is history-agnostic and may not realize the historical context of songs when
clustering However I believe that there is still significant merit to my research
Instead of classifying genres of music by early genres that led to them my approach
gives the most credit to the artists and songs that were the most innovative for their
time and perhaps reveal different musical styles that are more similar to each other
than history would otherwise imply This way of thinking of music genres while
unconventional is another way of imagining EM
The practice of quantitatively analyzing music has exploded in the last decade
thanks to technological and algorithmic advances that allow data scientists to con-
structively sift through troves of music and listener information In the literature
review I will focus on two particular organizations that have contributed greatly to
the large-scale mathematical analysis of music Pandora a website that plays songs
similar to a songartistalbum inputted by a user and Echo Nest a music analytics
firm that was acquired by Spotify in 2014 and drives Spotifyrsquos Discover Weekly
feature [3] After evaluating the relevance of these sources to my thesis work I will
then look over the relevant academic research and evaluate what this research can
contribute
12 Literature Review
The analysis of quantitative music generally falls into two categories research con-
ducted by academics and academic organizations for scholarly purposes and research
conducted by companies and primarily targeted for consumers First looking at the
consumer-based research Spotify and Pandora are two of the most prominent based
groups and the two I decided to focus on Spotify is a music streaming service where
users can listen to albums and songs from a wide variety of artists or listen to weekly
3
playlists generated based on the music the user and userrsquos friends have listened to
The weekly playlist called Discover Weekly Playlist is a relatively new feature in
Spotify and is driven by music analysis algorithms created from Echo Nest Using
the Echo Nest code interface Spotify creates a ldquotaste profilerdquo for each user which
assesses attributes such as how often a user branches out to new styles of music how
closely the userrsquos music streamed follows popular Billboard music charts and so on
Spotify also looks at the artists and songs the user streamed and creates clusters
of different genres that the user likes (see figure 11) The taste profile and music
clusters can then be used to generate playlists geared to a specific user The genres
in the cluster come from a list of nearly 800 names which are derived by scraping
the Internet for trending terms in music as well as training various algorithms on a
regular basic by ldquolisteningrdquo to new songs [4][5]
Figure 11 A userrsquos taste profile generated by Spotify
4
Although Spotify and Echo Nestrsquos algorithms are very useful for mapping the land-
scape of established and emerging genres of music the methodology is limited to
pre-defined genres of music This may serve as a good starting point to compare my
final results to but my study aims to be as context-free as possible by attaching no
preconceived notions of music styles or genres instead looking at features that could
be measured in every song
While Spotifyrsquos approach to mapping music is very high-tech and based on ex-
isting genres Pandora takes a very low-tech and context-free approach to music
clustering Pandora created the Music Genome Project a multi-year undertaking
where skilled music theorists listened to a large number of songs and analyzed up to
450 characteristics in each song [6] Pandorarsquos approach is appealing to the aim of
my study since it does not take any preconceived notions of what a genre of music
is instead comparing songs on common characteristics such as pitch rhythm and
instrument patterns Unfortunately I do not have a cadre of skilled music theorists
at my disposal nor do I have 10 years to perform such calculations like the dedicated
workers at Pandora (tips the indestructible fedora) Additionally Pandorarsquos Music
Genome Project is intellectual property so at best I can only rely on the abstract
concepts of the Music Genome Project to drive my study
In the academic realm there are no existing studies analyzing quantifiable changes in
EM specifically but there exist a few studies that perform such analysis on popular
Western music in general One such study is Measuring the Evolution of Contem-
porary Western Popular Music which analyzes music from 1955-2010 spanning all
common genres Using the Million Song Dataset a free public database of songs
each containing metadata (see section 13) the study focuses on the attributes pitch
timbre and loudness Pitch is defined as the standard musical notes or frequency of
5
the sound waves Timbre is formally defined as the Mel frequency cepstral coefficients
(MFCC) of a transformed sound signal More informally it refers to the sound color
texture or tone quality and is associated with instrument types recording resources
and production techniques In other words two sounds that have the same pitch
but different tones (for example a bell and voice) are differentiated by their timbres
There are 12 MFCCs that define the timbre of a given sound Finally loudness
refers to intrinsically how loud the music sounds not loudness that a listener can
manipulate while listening to the music Loudness is the first MFCC of the timbre
of a sound [7] The study concluded that over time music has been becoming louder
and less diverse
The restriction of pitch sequences (with metrics showing less variety inpitch progressions) the homogenization of the timbral palette (with fre-quent timbres becoming more frequent) and growing average loudnesslevels (threatening a dynamic richness that has been conserved until to-day) This suggests that our perception of the new would be essentiallyrooted on identifying simpler pitch sequences fashionable timbral mix-tures and louder volumes Hence an old tune with slightly simpler chordprogressions new instrument sonorities that were in agreement with cur-rent tendencies and recorded with modern techniques that allowed forincreased loudness levels could be easily perceived as novel fashionableand groundbreaking
This study serves as a good starting point for mathematically analyzing music in
a few ways First it utilizes the Million Song Dataset which addresses the issue
of legally obtaining music metadata As mentioned in section 13 the only legal
way to obtain playable music for this study would have been to purchase all songs I
would include which is infeasible While the Million Song Dataset does not contain
the audio files in playable format it does contain audio features and metadata that
allow for in-depth analysis In addition working with the dataset takes out the
work of extracting features from raw audio files saving an extensive amount of time
and energy Second the study establishes specifics for what constitutes a trend
in music Pitch timbre and loudness are core features of music and examining the6
distributions of each among songs over time reveals a lot of information about how
the music industry and consumersrsquo tastes have evolved While these are not all of the
features contained in a song they serve as a good starting point Third the study
defines mathematical ways to capture music attributes and measure their change
over time For example pitches are transposed into the same tonal context with
binary discretized pitch descriptions based on a threshold so that each song can be
represented with vectors of pitches that are normalized and compared to other songs
While this study lays some solid groundwork for capturing and analyzing nu-
meric qualities of music it falls short of addressing my goals in a couple of ways
First it does not perform any analysis with respect to music genre While the
analysis performed in this paper could easily be applied to a list of songs in a specific
genre certain genres might have unique sounds and rhythms relative to other genres
that would be worth studying in greater detail Second the study only measures
general trends in music over time The models used to describe changes are simple
regressions that donrsquot look at more nuanced changes For example what styles of
music developed over certain periods of time How rapid were those changes Which
styles of music developed from which other styles
A more promising study led by music researcher Matthias Mauch [8] analyzes
contemporary popular Western Music from the 1960s to 2010s by comparing numer-
ical data on the pitches and timbre of a corpus of 17000 songs that appeared on the
Billboard Hot 100 Like the previously mentioned paper Measuring the Evolution
of Contemporary Western Popular Music Mauchrsquos study also creates abstractions
of pitch and timbre in order to provide a consistent and meaningful semantic inter-
pretation of musical data (see figure 12) However Mauchrsquos study takes this idea a
step further by using genre tags from Lastfm a music website and constructing a
7
hierarchy of music genres using hierarchical clustering Additionally the study takes
a crack at determining whether a particular band the Beatles was musically ground-
breaking for its time or merely playing off sounds that other bands had already used
Figure 12 Data processing pipeline for Mauchrsquos study illustrated with a segment ofQueenrsquos Bohemian Rhapsody 1975
While both Measuring the Evolution of Contemporary Western Popular Music
and Mauchrsquos study created abstractions of pitch and timbre Mauchrsquos study is more
appealing with respect to my goal because its end results align more closely with
mine Additionally the data processing pipeline offers several layers of abstraction
8
and depending on my progress I would be able to achieve at least one of the levels of
abstraction As shown in figure 12 each segment of a raw audio file is first broken
down into its 12 timbre MFCCs and pitch components Next the study constructs
ldquolexiconsrdquo or a dictionary of pitch and timbre terms that all songs can be compared
to For pitch the original data is in a N-by-12 matrix where N is the number of time
segments in the song and 12 the number of each of the notes found in an octave of
pitches Each time segment contains the relative strengths of each of the 12 pitches
However music sounds are not merely a collection of pitches but more precisely
chords Furthermore the similarity of two songs is not determined by the absolute
pitches of their chords but rather the progression of chords in the song all relative to
each other For example if all the notes in a song are transposed by one step the song
will sound different in terms of absolute pitch but the song will still be recognized
as the original because all of the relative movements from each chord to the next
are the same This phenomenon is captured in the pitch data by finding the most
likely chord played at each time segment then counting the change to the next chord
at each time step and generating a table of chord change frequencies for each song
Constructing the timbre lexcion is more complicated since there is no easy analogue
like chords for pitches to compare songs Mauchrsquos study utilizes a Gaussian Mixture
Model (GMM) by iterating over k=1 to k=N clusters where N is a large number
running the GMM on each prior assumption of k clusters and computing the Bayes
Information Criterion (BIC) for each model The lowest of the N BIC values is found
and that value of k is selected That model contains k different timbre clusters
and each cluster contains the mean timbre value for each of the 12 timbre components
For my research I decided that the pitch and timbre lexicons would be the most
realistic level of abstraction I could obtain Mauchrsquos study adds an addtional layer
to pitch and timbre by identifying the most common patterns of chord changes and
9
most common timbre rhythms and creating more general tags from these combined
terms such as ldquo stepwise changes indicating modal harmonyrdquo for a pitch topic and
ldquooh rounded mellowrdquo for a timbral topic There were two problems with using this
final layer of abstraction for my study First attaching semantic interpretations to
the pitch and timbral lexicons is a difficult task For timbre I would need to listen
to sound samples containing all of the different timbral categories I identified and
attaching user interpretations to them For the chords not only would I have to
perform the same analysis as on timbre but take careful attention to identify which
chords correspond to common sound progressions in popular music a task that I am
not qualified for an did not have the resources for this thesis to seek out Second
this final layer of abstraction was not necessary for the end goal of my paper In
fact consolidating my pitch and timbre lexicons into simpler phrases would run the
risk of pigeonholing my analysis and preventing me from discovering more nuanced
patterns in my final results Therefore I decided to focus on pitch and timbral
lexicon construction as the furthest levels of abstraction when processing songs for
my thesis Mathematical details on how I constructed the lexical and timbral lexicons
can be found in the Mathematical Modeling section of this paper
13 The Dataset
In order to successfully execute my thesis I need access to an extensive database of
music Until recently acquiring a substantial corpus of music data was a difficult and
costly task It is illegal to download music audio files from video and music-sharing
sites such as YouTube Spotify and Pandora Some platforms such as iTunes offer
90-second previews of songs but using only segments of songs and usually segments
that showcase the chorus of the song are not reliable measures to capture the entire
essence of a song Even if I were to legally download entire audio files for free I would
10
run into additional issues Obtaining a high-quality corpora of song data would be
challenging writing scripts that crawl music sharing platforms may not capture all of
the music I am looking for And once I have the audio files I would have to perform
audio processing techniques to extract the relevant information from the songs
Fortunately there is an easy solution to the music data acquisition problem
The Million Song Dataset (MSD) is a collection of metadata for one million music
tracks dating up to 2011 Various organizations such as The Echo Nest Musicbrainz
7digital and Lastfm have contributed different pieces of metadata Each song is
represented as a Hierarchical Data Format file (HDF5) which can be loaded as a
JSON object The fields encompass topical features such as the song title artist
and release date as well as lower-level features such as the loudness starting beat
time pitches and timbre of several segments of the song [9] While the MSD is
the largest free and open source music metadata dataset I could find there is no
guarantee that it adequately covers the entire spectrum of EM artists and songs
This quality limitation is important to consider throughout the study A quick look
through the songs including the subset of data I worked with for this report showed
that there were several well-known artists and songs in the EM scene Therefore
while the MSD may not contain all desired songs for this project it contains an
adequate number of relevant songs to produce some meaningful results Additionally
laying the groundwork for modeling the similarities between songs and identifying
groundbreaking ones is the same regardless of the songs included and the following
methodologies can be implemented on any similarly-formatted dataset including one
with songs that might currently be missing in the MSD
11
Chapter 2
Mathematical Modeling
21 Determining Novelty of Songs
Finding an logical and implementable mathematical model was and continues to be
an important aspect of my research My problem how to mathematically determine
which songs were unique for their time requires an algorithm in which each song is
introduced in chronological order either joining an existing category or starting a
new category based on its musical similarity to songs already introduced Clustering
algorithms like k-means or Gaussian Mixture Models (GMM) which have a prede-
termined number of clusters and optimize the partitioning of a dataset into those
cluster assume a fixed number of clusters While this process would work if we knew
exactly how many genres of EM existed if we guess wrong our end results may end
up with clusters that are wrongly grouped together or separated It is much better to
apply a clustering algorithm that does not make any assumptions about this number
One particularly promising process that addresses the issue of tthe number of
clusters is a family of algorithms known as Dirichlet Proccesses (DPs) DPs are
useful for this particular application because (1) they assign clusters to a dataset
12
with only an upper bound on the number of clusters and (2) by sorting the songs
in chronological order before running the algorithm and keeping track of which
songs are categorized under each cluster we can observe the earliest songs in each
cluster and consequentially infer which songs were responsible for creating new
clusters The arguments for the DP The DP is controlled by a parameter α which
is the concentration parameter The expected number of clusters formed is directly
proportional to the value of α so the higher the value of α the more likely new
clusters will be formed [10] Regardless of the value of α as the number of data
points introduced increases the probability of a new group being formed decreases
That is a ldquorich get richerrdquo policy is in place and existing clusters tend to grow in
size Tweaking the value of the tunable parameter α is an important part of the
study since it determines the flexibility given to forming a new cluster If the value
of α is too small then the criteria for forming clusters will be too strict and data
that should be in different clusters will be assigned to the same cluster On the other
hand if α is too large the algorithm will be too sensitive and assign similar songs to
different clusters
The implementation of the DP was achieved using scikit-learnrsquos library and API for
Dirichlet Process Gaussian Mixture Model (DPGMM) The DPGMM is the formal
name of the Dirichlet Process model used to cluster the data More specifically
scikit-learnrsquos implementation of the DPGMM uses the Stick Breaking method
one of several equally valid methods to assign songs to clusters [11] While the
mathematical details for this algorithm can be found at the following citation [12]
the most important aspects of the DPGMM are the arguments that the user can
specify and tune The first of these tunable parameters is the value α which is the
same parameter as the α discussed in the previous paragraph As seen in Figure 21
on the right side properly tuning α is key to obtaining meaningful clusters The
13
center image has α set to 001 which is too small and results in all of the data being
formed under one cluster On the other hand the bottom-right image has the same
data set and α set to 100 which does a better job of clustering On a related note
the figure also demonstrates the effectiveness of the DPGMM over the GMM On the
left side clearly the dataset contains 2 clusters but the GMM on the top-left image
assumes 5 clusters as a prior and consequentially clusters the data incorrectly while
the DPGMM manages to limit the data to 2 clusters
The second argument that the user inputs for the DPGMM is the data that
will be clustered The scikit-learn implementation takes the data in the format
of a nested list (N lists each of length m) where N is the number of data points
and m the number of features While the format of the data structure is relatively
straightforward choosing which numbers should be in the data was a challenge I
faced Selecting the relevant features of each song to be used in the algorithm will
be expounded upon in the next section ldquoFeature Selectionrdquo
The last argument that a user inputs for the scikit-learn DGPMM implementa-
tion is an argument indicating the upper bound for the number of clusters The
Dirichlet Process then determines the best number of clusters for the data between
1 and the upper bound Since the DPGMM is flexible enough to find the best value
I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to
modify the number of clusters formed
22 Feature Selection
One of the most difficult aspects of the Dirichlet method is choosing the features
to be used for clustering In other words when we organize the songs into clusters
14
Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α
we need to ensure that each cluster is distinct in a way that is statistically and
intuitively logical In the Million Song Dataset [9] each song is represented as a
JSON object containing several fields These fields are candidate features to be used
in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo
by Rick Astley and the corresponding features
artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID
for this artists is db9)
artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)
artist_mbtags_count shape = (4)
(raw tag count of the 4 tags this artist received on musicbrainzorg)
artist_name Rick Astley (artist name)
artist_playmeid 1338 (the ID of that artist on the service playmecom)
artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)
artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest
(number between 0 and 1))
artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest
(number between 0 and 1))
15
audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for
the analysis by The Echo Nest)
bars_confidence shape = (99) (confidence value (between 0 and 1) associated
with each bar by The Echo Nest)
bars_start shape = (99) (start time of each bar according to The Echo Nest this
song has 99 bars)
beats_confidence shape = (397))
confidence value (between 0 and 1) associated with each beat by The Echo Nest
beats_start shape = (397) (start time of each beat according to The Echo Nest
this song has 397 beats)
danceability 00 (danceability measure of this song according to The Echo Nest
(between 0 and 1 0 =gt not analyzed))
duration 21169587 (duration of the track in seconds)
end_of_fade_in 0139 (time of the end of the fade in at the beginning of the
song according to The Echo Nest)
energy 00 (energy measure (not in the signal processing sense) according to The
Echo Nest (between 0 and 1 0 = not analyzed))
key 1 (estimation of the key the song is in by The Echo Nest)
key_confidence 0324 (confidence of the key estimation)
loudness -775 (general loudness of the track)
mode 1 (estimation of the mode the song is in by The Echo Nest)
mode_confidence 0434 (confidence of the mode estimation)
release Big Tunes - Back 2 The 80s (album name from which the track was taken
some songs tracks can come from many albums we give only one)
release_7digitalid 786795 (the ID of the release (album) on the service 7digi-
talcom)
sections_confidence shape = (10) (confidence value (between 0 and 1) associated
16
with each section by The Echo Nest)
sections_start shape = (10) (start time of each section according to The Echo
Nest this song has 10 sections)
segments_confidence shape = (935) (confidence value (between 0 and 1) asso-
ciated with each segment by The Echo Nest)
segments_loudness_max shape = (935) (max loudness during each segment)
segments_loudness_max_time shape = (935) (time of the max loudness
during each segment)
segments_loudness_start shape = (935) (loudness at the beginning of each
segment)
segments_pitches shape = (935 12) (chroma features for each segment (normal-
ized so max is 1))
segments_start shape = (935) (start time of each segment ( musical event or
onset) according to The Echo Nest this song has 935 segments)
segments_timbre shape = (935 12) (MFCC-like features for each segment)
similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar
to Rick Astley according to The Echo Nest)
song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded
(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))
song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can
be associated with many tracks (with very slight audio differences))
start_of _fade _out 198536 (start time of the fade out in seconds at the end
of the song according to The Echo Nest)
tatums_confidence shape = (794) (confidence value (between 0 and 1) associated
with each tatum by The Echo Nest)
tatums_start shape = (794) (start time of each tatum according to The Echo
Nest this song has 794 tatums)
17
tempo 113359 (tempo in BPM according to The Echo Nest)
time_signature 4 (time signature of the song according to The Echo Nest ie
usual number of beats per bar)
time_signature_confidence 0634 (confidence of the time signature estimation)
title Never Gonna Give You Up (song title)
track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)
track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track
on which the analysis was done) year 1987 (year when this song was released
according to musicbrainzorg)
When choosing features my main goal was to use features that would most
likely yield meaningful results yet also be simple and make sense to the average
person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener
will have his or her opinions to what constitutes different types of music but some
common features most people tend to differentiate songs by are pitch rhythm and
the types of instruments used The following specific fields provided in each song
object fall under these three terms
Pitch
bull segments_pitches a matrix of values indicating the strength of each pitch (or
note) at each discernible time interval
Rhythm
bull beats_start a vector of values indicating the start time of each beat
bull time_signature the time signature of the song
bull tempo the speed of the song in Beats Per Minute (BPM)
Instruments
18
bull segments_timbre a matrix of values indicating the distribution of MFCC-like
features (different types of tones) for each segments
The segments_pitches feature is a clear candidate for a differentiating factor for
songs since it reveals patterns of notes that occur Additionally other research
papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a
procedure that allows all songs to be compared with the same metric Likewise
timbre is intuitively a reliable differentiating feature since it reveals the amount
that different tones or sounds that sound different despite having the same pitch
Therefore segments_timbre is another feature that is considered in each song
Finally we look at the candidate features for rhythm At first glance all of these
features appear to be useful as they indicate the rhythm of a song in one way or
another However none of these features are as useful as the pitch and timbre
features While tempo is one factor in differentiating genres of EDM and music in
general tempo alone is not a driving force of musical innovation Certain genres
of EDM like drum nrsquo bass and happycore stand out for having very fast tempos
but the tempo is supplemented with a sound unique to the genre Conceiving new
arrangements of pitches combining instruments in new ways and inventing new
types of sounds are novel but speeding up or slowing down existing sounds is not
Including tempo as a feature could actually add noise to the model since many genres
overlap in their tempos And finally tempo is measured indirectly when the pitch
and timbre features are normalized for each song everything is measured in units of
ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features
each second Time signature can be dismissed from the candidates features for the
same reason as tempo many genres contain the same time signature and including
it in the feature set would only add more noise beats_start looks like a more
promising feature since like segments_pitches and segments_timbre it consists of
a vector of values However difficulties arise when we begin to think how exactly
19
we can utilize this information Since each song varies in length we need a way to
compare songs of different durations on the same level One approach could be to
perform basic statistics on the distance between each beat for example calculating
the mean and standard deviation of this distance However the normalized pitch
and timbre information already capture this data Another possibility is detecting
certain patterns of beats which could differentiate the syncopated dubstep or glitch
music beats from the steady pulse of electro-house But once again every beat is
accompanied by a sound with a specific timbre and pitch so this feature would not
add any significantly new information
23 Collecting Data and Preprocessing Selected Fea-
tures
231 Collecting the Data
Upon deciding the features I wanted to use in my research I first needed to collect
all of the electronic songs in the Million Song Dataset The easiest reliable way to
achieve this was to iterate through each song in the database and save the information
for the songs where any of the artist genre tags in artist_mbtags matched with an
electronic music genre While this measure was not fully accurate because it looks at
the genre of the artist not the song specific genre information for each song was not
as easily accessible so this indicator was nearly as good a substitute To generate a
list of the genres that electronic songs would fall under I manually searched through
a subset of the MSD to find all genres that seemed to be releated to electronic music
In the case of genres that were sometimes but not always electronic in nature such
20
as disco or pop I erred on the side of caution and did not include them in the list
of electronic genres In these cases false positives such as primarily rock songs that
happen to have the disco label attached to the artist could inadvertantly be included
in the dataset The final list of genres is as follows
target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo
rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo
rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo
rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
rsquodance and electronicarsquorsquoelectronicrsquo]
232 Pitch Preprocessing
A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-
cally informed manner The study first takes the raw sound data and converts it into
a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest
amount Then it computes the most likely chord by comparing comparing the 4 most
common types of chords in popular music (major minor dominant 7 and minor 7) to
the observed chord The most common chords are represented as ldquotemplate chordsrdquo
and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For
example using the note C as the first index the C major chord is represented as
CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)
For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is
computed over every template chord
ρCTc =12sumi=1
(CT minus CTi)(ci minus c)σCTσc
21
where CT is the mean of the values in the template chord σCT is the standard
deviation of the values in the chord and the operations on c are analogous Note
that the summation is over each individual pitch in the 12 pitch classes The chord
template with the highest value of ρ is selected as the chord for the time frame
After this is performed for each time frame the values are smoothed and then the
change between adjacent chords is observed The reasoning behind this step is that
by measuring the relative distance between chords rather than the chords themselves
all songs can be compared in the same manner even though they may have different
key signatures Finally the study takes the types of chord changes and classifies
them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted
versions of the chord changes that make more sense to a human such as ldquochanges
involving dominant 7th chordsrdquo
In my preliminary implementation of this method on an electronic dance music
corpus I made a few modifications to Mauchrsquos study First I smoothed out time
frames before computing the most probable chords rather than smoothing the most
probable chords I did this to save time and to reduce volatility in the chord
measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference
which contains 935 time frames and lasts 212 seconds 5 time frames is slightly
under 1 second and for preliminary testing appeared to be a good interval for each
time block Second as mentioned in the literature section I did not abstract the
chord changes into H-topics This decision also stemmed from time constraints since
deriving semantic chord meaning from EDM songs would require careful research
into the types of harmonies and sounds common in that genre of music Below I
included a high-level visualization of the pitch metadata found in a sample song
ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change
vector that I could then feed into the Dirichlet Process algorithm
22
Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813
CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513
DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913
FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713
GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213
AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513
13 13 13 13
060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013
13 13 13 13 13 13 13 13 13 13 13 13
13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13
Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13
Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13
Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13
Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13
23
13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13
13
13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13
Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13
24
A final step I took to normalize the chord change data was to divide the numbers by
the length of the song so that each songrsquos number of chord changes was measured per
second
233 Timbre Preprocessing
For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre
uniformly across all songs [8] After collecting all song metadata I took a random
sample of 20 songs from each year starting at 1970 The reason I forced the sampling
to 20 randomly sampled songs from each year and did not take a random sample of
songs from all years at once was to prevent bias towards any type of sounds As seen
in figure 22 there are significantly more songs from 2000-2011 than before 2000 The
mean year is x = 2001052 the median year is 2003 and the standard deviation of the
years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include
a disproportionate amount of more recent songs In order to not miss out on sounds
that may be more prevalent in older songs I required a set number of songs from each
year Next from each randomly selected song I selected 20 random timbre frames
in order to prevent any biases in data collection within each song In total there
were 422020 = 16800 timbre frames collected Next I clustered the timbre frames
using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to
100 and selecting the number of clusters with the lowest Bayes Information Criterion
(BIC) a statistical measure commonly used to calculate the best fitting model The
BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters
and saved the mean values of each of the 12 timbre segments for each cluster formed
In the same way that every song had the same 192 cord changes whose frequencies
could be compared between songs each song now had the same 46 timbre clusters
but different frequencies in each song When reading in the metadata from each song
I calculated the most likely timbre cluster each timbre frame belonged to and kept
25
Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear
a frequency count of all of the possible timbre clusters observed in a song Finally
as with the pitch data I divided all observed counts by the duration of the song in
order to normalize each songrsquos timbre counts
26
Chapter 3
Results
31 Methodology
After the pitch and timbre data was processed I ran the Dirichlet Process on the
data For each song I concatenated the 192-element chord change frequency list
and the 46-element timbre category frequency list giving each song a total of 238
features However there is a problem with this setup The pitch data will inherently
dominate the clustering process since it contains almost 3 times as many features
as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to
give different weights to each feature I considered another possibility to remedy
this discrepancy duplicating the timbre vector a certain number of times and
concatenating that to the feature set of each song While this strategy runs the risk
of corrupting the feature set and turning it into something that does not accurately
represent each song it is important to keep in mind that even without duplicating
the timbre vector the feature set consists of two separate feature sets concatenated
to each other Therefore timbre duplication appears to be a reasonable strategy to
weigh pitch and timbre more evenly
27
After this modification I tweaked a few more parameters before obtaining my
final results Dividing the pitch and timbre frequencies by the duration of the song
normalized every song to frequency per second but it also had the undesired effect
of making the data too small Timbre and pitch frequencies per second were almost
always less than 10 and many times hovered as low as 0002 for nonzero values
Because all of the values were very close to each other using common values of α
in the range of 01 to 1000-2000 was insufficient to push the songs into different
clusters As a result every song fell into the same cluster Increasing the value
of α by several orders of magnitude to well over 10 million fixed the problem but
this solution presented two problems First tuning α to experiment with different
ways to cluster the music would be problematic since I would have to work with
an enormous range of possible values for α Second pushing α to such high values
is not appropriate for the Dirichlet Process Extremely high values of α indicate a
Dirichlet Process that will try to disperse the data into different clusters but a value
of α that high is in principle always assigning each new song to a new cluster On
the other hand varying α between 01 and 1000 for example presents a much wider
range of flexibility when assigning clusters While this may be possible by varying
the values of α an extreme amount with the data as it currently is we are using
the Dirichlet Process in a way it should mathematically not be used Therefore
multiplying all of the data by a constant value so that we can work in the appropriate
range of α is the ideal approach After some experimentation I found that k=10 was
an appropriate scaling factor After initial runs of the Dirichlet Process I found out
that there was a slight issue with some of the earlier songs Since I had only artist
genre tags not specific song tags for each song I chose songs based on whether any
of the tags associated with the artist fell under any electronic music genre including
the generic term rsquoelectronicrsquo There were some bands mostly older ones from the
1960s and 1970s like Electric Light Orchestra which had some electronic music but
28
mostly featured rock funk disco or another genre Given that these artists featured
mostly non-electronic songs I decided to exclude them from my study and generate
a blacklist indicating these music artists While it was infeasible to look through
every single song and determine whether it was electronic or not I was able to look
over the earliest songs in each cluster These songs were the most important to verify
as electronic because early non-electronic songs could end up forming new clusters
and inadvertently create clusters with non-electronic sounds that I was not looking for
The goal of this thesis is to identify different groups in which EM songs are
clustered and identify the most unique artists and genres While the second task is
very simple because it requires looking at the earliest songs in each cluster the first
is difficult to gauge the effectiveness of While I can look at the average chord change
and timbre category frequencies in each category as well as other metadata putting
more semantic interpretations to what the music actually sounds like and determining
whether the music is clustered properly is a very subjective process For this reason
I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and
compared the clustering in each category examining similarities and differences in
the clusters formed in each scenario in the Discussion section For each value α I
set the upper limit of components or clusters allowed to 50 The ranges of α I used
resulted in 9 14 and 19 clusters formed
32 Findings
321 α=005
When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are
the distribution of years of the songs in each cluster (note that the Dirichlet Process
29
does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are
skipped)
30
Figure 31 Song year distributions for α = 005
For each value of α I also calculated the average frequency of each chord change
category and timbre category for each cluster and plotted the results The green
lines correspond to timbre and the blue lines to pitch
31
Figure 32 Timbre and pitch distributions for α = 005
A table of each cluster formed the number of songs in that cluster and descriptions
of pitch timbre and rhythmic qualities characteristic of songs in that cluster are
shown below
32
Cluster Song Count Characteristic Sounds
0 6481 Minimalist industrial space sounds dissonant chords
1 5482 Soft New Age ethereal
2 2405 Defined sounds electronic and non-electronic instru-
ments played in standard rock rhythms
3 360 Very dense and complex synths slightly darker tone
4 4550 Heavily distorted rock and synthesizer
6 2854 Faster paced 80s synth rock acid house
8 798 Aggressive beats dense house music
9 1464 Ambient house trancelike strong beats mysterious
tone
11 1597 Melancholy tones New wave rock in 80s then starting
in 90s downtempo trip-hop nu-metal
Table 31 Song cluster descriptions for α = 005
322 α=01
A total of 14 clusters were formed (16 were formed but 2 clusters contained only
one song each I listened to both of these songs and they did not sound unique so
I discarded them from the clusters) Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
33
34
Figure 33 Song year distributions for α = 01
35
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
To my parents
v
Contents
Abstract iii
Acknowledgements iv
List of Tables viii
List of Figures ix
1 Introduction 1
11 Background Information 1
12 Literature Review 3
13 The Dataset 10
2 Mathematical Modeling 12
21 Determining Novelty of Songs 12
22 Feature Selection 14
23 Collecting Data and Preprocessing Selected Features 20
231 Collecting the Data 20
232 Pitch Preprocessing 21
233 Timbre Preprocessing 25
3 Results 27
31 Methodology 27
32 Findings 29
321 α=005 29
vi
322 α=01 33
323 α=02 38
33 Analysis 46
4 Conclusion 53
41 Design Flaws in Experiment 53
42 Future Work 55
43 Closing Remarks 56
A Code 57
A1 Pulling Data from the Million Song Dataset 57
A2 Calculating Most Likely Chords and Timbre Categories 58
A3 Code to Compute Timbre Categories 60
A4 Helper Methods for Calculations 61
Bibliography 68
vii
List of Tables
31 Song cluster descriptions for α = 005 33
32 Song cluster descriptions for α = 01 38
33 Song cluster descriptions for α = 02 45
viii
List of Figures
11 A userrsquos taste profile generated by Spotify 4
12 Data processing pipeline for Mauchrsquos study illustrated with a segment
of Queenrsquos Bohemian Rhapsody 1975 8
21 scikit-learn example of GMM vs DPGMM and tuning of α 15
22 Number of Electronic Music Songs in Million Song Dataset from Each
Year 26
31 Song year distributions for α = 005 31
32 Timbre and pitch distributions for α = 005 32
33 Song year distributions for α = 01 35
34 Timbre and pitch distributions for α = 01 37
35 Song year distributions for α = 02 41
36 Timbre and pitch distributions for α = 02 44
ix
Chapter 1
Introduction
11 Background Information
Electronic Music (EM) is an increasingly popular genre of music with an immense
presence and influence on modern culture Because the genre is new as a whole and
is arguably more loosely structured than other genres - technology has enabled the
creation of a wide range of sounds and easy blending of existing and new sounds alike
- formal analysis especially mathematical analysis on the genre is fairly limited and
has only begun growing in the past few years As a fan of EM I am interested in
exploring how the genre has evolved over time More specifically my goal with this
project was to design some structure or model that could help me identify which EM
artists have contributed the most stylistically to the genre Oftentimes famous EM
artists do not create novel-sounding music but rather popularize an existing style
and the motivation of this study is to understand who has stylistically contributed
the most to the EM scene versus those who have merely popularized aspects of it
As the study progressed the manner in which I constructed my model lent to
a second goal of the thesis imagining new ways in which we can imagine EM genres
1
While there exists an extensive amount of research analyzing music trends from
a non-mathematical (cultural societal artistic) perspective the analysis of EM
from a mathematical perspective and especially with respect to any computationally
measurable trends in the genre is close to nonexistent EM has been analyzed to a
lesser extent than other common genres of music in the academic world most likely
due to existing for a shorter amount of time and being less rooted in prominant
social and cultural events In fact the first published reference work on EM did not
exist until 2012 when Professor Mark J Butler from Northwestern University edited
and published Electronica Dance and Club Music a collection of essays exploring
EM genres and culture [1] Furthermore there are very few comprehensive visual
guides that allow a user to relate every genre to each other and easily observe how
different genres converge and diverge While conducting research the best guide I
found was not a scholarly source but an online guide created by an EM enthusiast
Ishkurrsquos Guide to Electronic Music [2] This guide which includes over 100 specific
genres grouped by more general genres and represents chronological evolutions by
connecting each genre in a flowchart is the most exhaustive analysis of the EM scene
I could find However the guidersquos analysis is very qualitative While each subgenre
contains an explanation on typical rhythm and sounds and includes well-known
songs indicative of the style the guide is created by someone who used historical and
personal knowledge of EM My model which creates music genres by chronologically
ordering songs and then assigning them to clusters is a different approach towards
imagining the entire landscape of EM The results may confirm Ishkurrsquos Guidersquos
findings in which case his guide is given additional merit with mathematical evi-
dence or it may be different suggesting that there may be better ways to group EM
genres One advantage that guides such as Ishkurrsquos and historically-based scholarly
works have over my approach is that those models are history-sensitive and therefore
may group songs in a way that historically makes sense On the other hand my
2
model is history-agnostic and may not realize the historical context of songs when
clustering However I believe that there is still significant merit to my research
Instead of classifying genres of music by early genres that led to them my approach
gives the most credit to the artists and songs that were the most innovative for their
time and perhaps reveal different musical styles that are more similar to each other
than history would otherwise imply This way of thinking of music genres while
unconventional is another way of imagining EM
The practice of quantitatively analyzing music has exploded in the last decade
thanks to technological and algorithmic advances that allow data scientists to con-
structively sift through troves of music and listener information In the literature
review I will focus on two particular organizations that have contributed greatly to
the large-scale mathematical analysis of music Pandora a website that plays songs
similar to a songartistalbum inputted by a user and Echo Nest a music analytics
firm that was acquired by Spotify in 2014 and drives Spotifyrsquos Discover Weekly
feature [3] After evaluating the relevance of these sources to my thesis work I will
then look over the relevant academic research and evaluate what this research can
contribute
12 Literature Review
The analysis of quantitative music generally falls into two categories research con-
ducted by academics and academic organizations for scholarly purposes and research
conducted by companies and primarily targeted for consumers First looking at the
consumer-based research Spotify and Pandora are two of the most prominent based
groups and the two I decided to focus on Spotify is a music streaming service where
users can listen to albums and songs from a wide variety of artists or listen to weekly
3
playlists generated based on the music the user and userrsquos friends have listened to
The weekly playlist called Discover Weekly Playlist is a relatively new feature in
Spotify and is driven by music analysis algorithms created from Echo Nest Using
the Echo Nest code interface Spotify creates a ldquotaste profilerdquo for each user which
assesses attributes such as how often a user branches out to new styles of music how
closely the userrsquos music streamed follows popular Billboard music charts and so on
Spotify also looks at the artists and songs the user streamed and creates clusters
of different genres that the user likes (see figure 11) The taste profile and music
clusters can then be used to generate playlists geared to a specific user The genres
in the cluster come from a list of nearly 800 names which are derived by scraping
the Internet for trending terms in music as well as training various algorithms on a
regular basic by ldquolisteningrdquo to new songs [4][5]
Figure 11 A userrsquos taste profile generated by Spotify
4
Although Spotify and Echo Nestrsquos algorithms are very useful for mapping the land-
scape of established and emerging genres of music the methodology is limited to
pre-defined genres of music This may serve as a good starting point to compare my
final results to but my study aims to be as context-free as possible by attaching no
preconceived notions of music styles or genres instead looking at features that could
be measured in every song
While Spotifyrsquos approach to mapping music is very high-tech and based on ex-
isting genres Pandora takes a very low-tech and context-free approach to music
clustering Pandora created the Music Genome Project a multi-year undertaking
where skilled music theorists listened to a large number of songs and analyzed up to
450 characteristics in each song [6] Pandorarsquos approach is appealing to the aim of
my study since it does not take any preconceived notions of what a genre of music
is instead comparing songs on common characteristics such as pitch rhythm and
instrument patterns Unfortunately I do not have a cadre of skilled music theorists
at my disposal nor do I have 10 years to perform such calculations like the dedicated
workers at Pandora (tips the indestructible fedora) Additionally Pandorarsquos Music
Genome Project is intellectual property so at best I can only rely on the abstract
concepts of the Music Genome Project to drive my study
In the academic realm there are no existing studies analyzing quantifiable changes in
EM specifically but there exist a few studies that perform such analysis on popular
Western music in general One such study is Measuring the Evolution of Contem-
porary Western Popular Music which analyzes music from 1955-2010 spanning all
common genres Using the Million Song Dataset a free public database of songs
each containing metadata (see section 13) the study focuses on the attributes pitch
timbre and loudness Pitch is defined as the standard musical notes or frequency of
5
the sound waves Timbre is formally defined as the Mel frequency cepstral coefficients
(MFCC) of a transformed sound signal More informally it refers to the sound color
texture or tone quality and is associated with instrument types recording resources
and production techniques In other words two sounds that have the same pitch
but different tones (for example a bell and voice) are differentiated by their timbres
There are 12 MFCCs that define the timbre of a given sound Finally loudness
refers to intrinsically how loud the music sounds not loudness that a listener can
manipulate while listening to the music Loudness is the first MFCC of the timbre
of a sound [7] The study concluded that over time music has been becoming louder
and less diverse
The restriction of pitch sequences (with metrics showing less variety inpitch progressions) the homogenization of the timbral palette (with fre-quent timbres becoming more frequent) and growing average loudnesslevels (threatening a dynamic richness that has been conserved until to-day) This suggests that our perception of the new would be essentiallyrooted on identifying simpler pitch sequences fashionable timbral mix-tures and louder volumes Hence an old tune with slightly simpler chordprogressions new instrument sonorities that were in agreement with cur-rent tendencies and recorded with modern techniques that allowed forincreased loudness levels could be easily perceived as novel fashionableand groundbreaking
This study serves as a good starting point for mathematically analyzing music in
a few ways First it utilizes the Million Song Dataset which addresses the issue
of legally obtaining music metadata As mentioned in section 13 the only legal
way to obtain playable music for this study would have been to purchase all songs I
would include which is infeasible While the Million Song Dataset does not contain
the audio files in playable format it does contain audio features and metadata that
allow for in-depth analysis In addition working with the dataset takes out the
work of extracting features from raw audio files saving an extensive amount of time
and energy Second the study establishes specifics for what constitutes a trend
in music Pitch timbre and loudness are core features of music and examining the6
distributions of each among songs over time reveals a lot of information about how
the music industry and consumersrsquo tastes have evolved While these are not all of the
features contained in a song they serve as a good starting point Third the study
defines mathematical ways to capture music attributes and measure their change
over time For example pitches are transposed into the same tonal context with
binary discretized pitch descriptions based on a threshold so that each song can be
represented with vectors of pitches that are normalized and compared to other songs
While this study lays some solid groundwork for capturing and analyzing nu-
meric qualities of music it falls short of addressing my goals in a couple of ways
First it does not perform any analysis with respect to music genre While the
analysis performed in this paper could easily be applied to a list of songs in a specific
genre certain genres might have unique sounds and rhythms relative to other genres
that would be worth studying in greater detail Second the study only measures
general trends in music over time The models used to describe changes are simple
regressions that donrsquot look at more nuanced changes For example what styles of
music developed over certain periods of time How rapid were those changes Which
styles of music developed from which other styles
A more promising study led by music researcher Matthias Mauch [8] analyzes
contemporary popular Western Music from the 1960s to 2010s by comparing numer-
ical data on the pitches and timbre of a corpus of 17000 songs that appeared on the
Billboard Hot 100 Like the previously mentioned paper Measuring the Evolution
of Contemporary Western Popular Music Mauchrsquos study also creates abstractions
of pitch and timbre in order to provide a consistent and meaningful semantic inter-
pretation of musical data (see figure 12) However Mauchrsquos study takes this idea a
step further by using genre tags from Lastfm a music website and constructing a
7
hierarchy of music genres using hierarchical clustering Additionally the study takes
a crack at determining whether a particular band the Beatles was musically ground-
breaking for its time or merely playing off sounds that other bands had already used
Figure 12 Data processing pipeline for Mauchrsquos study illustrated with a segment ofQueenrsquos Bohemian Rhapsody 1975
While both Measuring the Evolution of Contemporary Western Popular Music
and Mauchrsquos study created abstractions of pitch and timbre Mauchrsquos study is more
appealing with respect to my goal because its end results align more closely with
mine Additionally the data processing pipeline offers several layers of abstraction
8
and depending on my progress I would be able to achieve at least one of the levels of
abstraction As shown in figure 12 each segment of a raw audio file is first broken
down into its 12 timbre MFCCs and pitch components Next the study constructs
ldquolexiconsrdquo or a dictionary of pitch and timbre terms that all songs can be compared
to For pitch the original data is in a N-by-12 matrix where N is the number of time
segments in the song and 12 the number of each of the notes found in an octave of
pitches Each time segment contains the relative strengths of each of the 12 pitches
However music sounds are not merely a collection of pitches but more precisely
chords Furthermore the similarity of two songs is not determined by the absolute
pitches of their chords but rather the progression of chords in the song all relative to
each other For example if all the notes in a song are transposed by one step the song
will sound different in terms of absolute pitch but the song will still be recognized
as the original because all of the relative movements from each chord to the next
are the same This phenomenon is captured in the pitch data by finding the most
likely chord played at each time segment then counting the change to the next chord
at each time step and generating a table of chord change frequencies for each song
Constructing the timbre lexcion is more complicated since there is no easy analogue
like chords for pitches to compare songs Mauchrsquos study utilizes a Gaussian Mixture
Model (GMM) by iterating over k=1 to k=N clusters where N is a large number
running the GMM on each prior assumption of k clusters and computing the Bayes
Information Criterion (BIC) for each model The lowest of the N BIC values is found
and that value of k is selected That model contains k different timbre clusters
and each cluster contains the mean timbre value for each of the 12 timbre components
For my research I decided that the pitch and timbre lexicons would be the most
realistic level of abstraction I could obtain Mauchrsquos study adds an addtional layer
to pitch and timbre by identifying the most common patterns of chord changes and
9
most common timbre rhythms and creating more general tags from these combined
terms such as ldquo stepwise changes indicating modal harmonyrdquo for a pitch topic and
ldquooh rounded mellowrdquo for a timbral topic There were two problems with using this
final layer of abstraction for my study First attaching semantic interpretations to
the pitch and timbral lexicons is a difficult task For timbre I would need to listen
to sound samples containing all of the different timbral categories I identified and
attaching user interpretations to them For the chords not only would I have to
perform the same analysis as on timbre but take careful attention to identify which
chords correspond to common sound progressions in popular music a task that I am
not qualified for an did not have the resources for this thesis to seek out Second
this final layer of abstraction was not necessary for the end goal of my paper In
fact consolidating my pitch and timbre lexicons into simpler phrases would run the
risk of pigeonholing my analysis and preventing me from discovering more nuanced
patterns in my final results Therefore I decided to focus on pitch and timbral
lexicon construction as the furthest levels of abstraction when processing songs for
my thesis Mathematical details on how I constructed the lexical and timbral lexicons
can be found in the Mathematical Modeling section of this paper
13 The Dataset
In order to successfully execute my thesis I need access to an extensive database of
music Until recently acquiring a substantial corpus of music data was a difficult and
costly task It is illegal to download music audio files from video and music-sharing
sites such as YouTube Spotify and Pandora Some platforms such as iTunes offer
90-second previews of songs but using only segments of songs and usually segments
that showcase the chorus of the song are not reliable measures to capture the entire
essence of a song Even if I were to legally download entire audio files for free I would
10
run into additional issues Obtaining a high-quality corpora of song data would be
challenging writing scripts that crawl music sharing platforms may not capture all of
the music I am looking for And once I have the audio files I would have to perform
audio processing techniques to extract the relevant information from the songs
Fortunately there is an easy solution to the music data acquisition problem
The Million Song Dataset (MSD) is a collection of metadata for one million music
tracks dating up to 2011 Various organizations such as The Echo Nest Musicbrainz
7digital and Lastfm have contributed different pieces of metadata Each song is
represented as a Hierarchical Data Format file (HDF5) which can be loaded as a
JSON object The fields encompass topical features such as the song title artist
and release date as well as lower-level features such as the loudness starting beat
time pitches and timbre of several segments of the song [9] While the MSD is
the largest free and open source music metadata dataset I could find there is no
guarantee that it adequately covers the entire spectrum of EM artists and songs
This quality limitation is important to consider throughout the study A quick look
through the songs including the subset of data I worked with for this report showed
that there were several well-known artists and songs in the EM scene Therefore
while the MSD may not contain all desired songs for this project it contains an
adequate number of relevant songs to produce some meaningful results Additionally
laying the groundwork for modeling the similarities between songs and identifying
groundbreaking ones is the same regardless of the songs included and the following
methodologies can be implemented on any similarly-formatted dataset including one
with songs that might currently be missing in the MSD
11
Chapter 2
Mathematical Modeling
21 Determining Novelty of Songs
Finding an logical and implementable mathematical model was and continues to be
an important aspect of my research My problem how to mathematically determine
which songs were unique for their time requires an algorithm in which each song is
introduced in chronological order either joining an existing category or starting a
new category based on its musical similarity to songs already introduced Clustering
algorithms like k-means or Gaussian Mixture Models (GMM) which have a prede-
termined number of clusters and optimize the partitioning of a dataset into those
cluster assume a fixed number of clusters While this process would work if we knew
exactly how many genres of EM existed if we guess wrong our end results may end
up with clusters that are wrongly grouped together or separated It is much better to
apply a clustering algorithm that does not make any assumptions about this number
One particularly promising process that addresses the issue of tthe number of
clusters is a family of algorithms known as Dirichlet Proccesses (DPs) DPs are
useful for this particular application because (1) they assign clusters to a dataset
12
with only an upper bound on the number of clusters and (2) by sorting the songs
in chronological order before running the algorithm and keeping track of which
songs are categorized under each cluster we can observe the earliest songs in each
cluster and consequentially infer which songs were responsible for creating new
clusters The arguments for the DP The DP is controlled by a parameter α which
is the concentration parameter The expected number of clusters formed is directly
proportional to the value of α so the higher the value of α the more likely new
clusters will be formed [10] Regardless of the value of α as the number of data
points introduced increases the probability of a new group being formed decreases
That is a ldquorich get richerrdquo policy is in place and existing clusters tend to grow in
size Tweaking the value of the tunable parameter α is an important part of the
study since it determines the flexibility given to forming a new cluster If the value
of α is too small then the criteria for forming clusters will be too strict and data
that should be in different clusters will be assigned to the same cluster On the other
hand if α is too large the algorithm will be too sensitive and assign similar songs to
different clusters
The implementation of the DP was achieved using scikit-learnrsquos library and API for
Dirichlet Process Gaussian Mixture Model (DPGMM) The DPGMM is the formal
name of the Dirichlet Process model used to cluster the data More specifically
scikit-learnrsquos implementation of the DPGMM uses the Stick Breaking method
one of several equally valid methods to assign songs to clusters [11] While the
mathematical details for this algorithm can be found at the following citation [12]
the most important aspects of the DPGMM are the arguments that the user can
specify and tune The first of these tunable parameters is the value α which is the
same parameter as the α discussed in the previous paragraph As seen in Figure 21
on the right side properly tuning α is key to obtaining meaningful clusters The
13
center image has α set to 001 which is too small and results in all of the data being
formed under one cluster On the other hand the bottom-right image has the same
data set and α set to 100 which does a better job of clustering On a related note
the figure also demonstrates the effectiveness of the DPGMM over the GMM On the
left side clearly the dataset contains 2 clusters but the GMM on the top-left image
assumes 5 clusters as a prior and consequentially clusters the data incorrectly while
the DPGMM manages to limit the data to 2 clusters
The second argument that the user inputs for the DPGMM is the data that
will be clustered The scikit-learn implementation takes the data in the format
of a nested list (N lists each of length m) where N is the number of data points
and m the number of features While the format of the data structure is relatively
straightforward choosing which numbers should be in the data was a challenge I
faced Selecting the relevant features of each song to be used in the algorithm will
be expounded upon in the next section ldquoFeature Selectionrdquo
The last argument that a user inputs for the scikit-learn DGPMM implementa-
tion is an argument indicating the upper bound for the number of clusters The
Dirichlet Process then determines the best number of clusters for the data between
1 and the upper bound Since the DPGMM is flexible enough to find the best value
I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to
modify the number of clusters formed
22 Feature Selection
One of the most difficult aspects of the Dirichlet method is choosing the features
to be used for clustering In other words when we organize the songs into clusters
14
Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α
we need to ensure that each cluster is distinct in a way that is statistically and
intuitively logical In the Million Song Dataset [9] each song is represented as a
JSON object containing several fields These fields are candidate features to be used
in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo
by Rick Astley and the corresponding features
artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID
for this artists is db9)
artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)
artist_mbtags_count shape = (4)
(raw tag count of the 4 tags this artist received on musicbrainzorg)
artist_name Rick Astley (artist name)
artist_playmeid 1338 (the ID of that artist on the service playmecom)
artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)
artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest
(number between 0 and 1))
artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest
(number between 0 and 1))
15
audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for
the analysis by The Echo Nest)
bars_confidence shape = (99) (confidence value (between 0 and 1) associated
with each bar by The Echo Nest)
bars_start shape = (99) (start time of each bar according to The Echo Nest this
song has 99 bars)
beats_confidence shape = (397))
confidence value (between 0 and 1) associated with each beat by The Echo Nest
beats_start shape = (397) (start time of each beat according to The Echo Nest
this song has 397 beats)
danceability 00 (danceability measure of this song according to The Echo Nest
(between 0 and 1 0 =gt not analyzed))
duration 21169587 (duration of the track in seconds)
end_of_fade_in 0139 (time of the end of the fade in at the beginning of the
song according to The Echo Nest)
energy 00 (energy measure (not in the signal processing sense) according to The
Echo Nest (between 0 and 1 0 = not analyzed))
key 1 (estimation of the key the song is in by The Echo Nest)
key_confidence 0324 (confidence of the key estimation)
loudness -775 (general loudness of the track)
mode 1 (estimation of the mode the song is in by The Echo Nest)
mode_confidence 0434 (confidence of the mode estimation)
release Big Tunes - Back 2 The 80s (album name from which the track was taken
some songs tracks can come from many albums we give only one)
release_7digitalid 786795 (the ID of the release (album) on the service 7digi-
talcom)
sections_confidence shape = (10) (confidence value (between 0 and 1) associated
16
with each section by The Echo Nest)
sections_start shape = (10) (start time of each section according to The Echo
Nest this song has 10 sections)
segments_confidence shape = (935) (confidence value (between 0 and 1) asso-
ciated with each segment by The Echo Nest)
segments_loudness_max shape = (935) (max loudness during each segment)
segments_loudness_max_time shape = (935) (time of the max loudness
during each segment)
segments_loudness_start shape = (935) (loudness at the beginning of each
segment)
segments_pitches shape = (935 12) (chroma features for each segment (normal-
ized so max is 1))
segments_start shape = (935) (start time of each segment ( musical event or
onset) according to The Echo Nest this song has 935 segments)
segments_timbre shape = (935 12) (MFCC-like features for each segment)
similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar
to Rick Astley according to The Echo Nest)
song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded
(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))
song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can
be associated with many tracks (with very slight audio differences))
start_of _fade _out 198536 (start time of the fade out in seconds at the end
of the song according to The Echo Nest)
tatums_confidence shape = (794) (confidence value (between 0 and 1) associated
with each tatum by The Echo Nest)
tatums_start shape = (794) (start time of each tatum according to The Echo
Nest this song has 794 tatums)
17
tempo 113359 (tempo in BPM according to The Echo Nest)
time_signature 4 (time signature of the song according to The Echo Nest ie
usual number of beats per bar)
time_signature_confidence 0634 (confidence of the time signature estimation)
title Never Gonna Give You Up (song title)
track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)
track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track
on which the analysis was done) year 1987 (year when this song was released
according to musicbrainzorg)
When choosing features my main goal was to use features that would most
likely yield meaningful results yet also be simple and make sense to the average
person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener
will have his or her opinions to what constitutes different types of music but some
common features most people tend to differentiate songs by are pitch rhythm and
the types of instruments used The following specific fields provided in each song
object fall under these three terms
Pitch
bull segments_pitches a matrix of values indicating the strength of each pitch (or
note) at each discernible time interval
Rhythm
bull beats_start a vector of values indicating the start time of each beat
bull time_signature the time signature of the song
bull tempo the speed of the song in Beats Per Minute (BPM)
Instruments
18
bull segments_timbre a matrix of values indicating the distribution of MFCC-like
features (different types of tones) for each segments
The segments_pitches feature is a clear candidate for a differentiating factor for
songs since it reveals patterns of notes that occur Additionally other research
papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a
procedure that allows all songs to be compared with the same metric Likewise
timbre is intuitively a reliable differentiating feature since it reveals the amount
that different tones or sounds that sound different despite having the same pitch
Therefore segments_timbre is another feature that is considered in each song
Finally we look at the candidate features for rhythm At first glance all of these
features appear to be useful as they indicate the rhythm of a song in one way or
another However none of these features are as useful as the pitch and timbre
features While tempo is one factor in differentiating genres of EDM and music in
general tempo alone is not a driving force of musical innovation Certain genres
of EDM like drum nrsquo bass and happycore stand out for having very fast tempos
but the tempo is supplemented with a sound unique to the genre Conceiving new
arrangements of pitches combining instruments in new ways and inventing new
types of sounds are novel but speeding up or slowing down existing sounds is not
Including tempo as a feature could actually add noise to the model since many genres
overlap in their tempos And finally tempo is measured indirectly when the pitch
and timbre features are normalized for each song everything is measured in units of
ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features
each second Time signature can be dismissed from the candidates features for the
same reason as tempo many genres contain the same time signature and including
it in the feature set would only add more noise beats_start looks like a more
promising feature since like segments_pitches and segments_timbre it consists of
a vector of values However difficulties arise when we begin to think how exactly
19
we can utilize this information Since each song varies in length we need a way to
compare songs of different durations on the same level One approach could be to
perform basic statistics on the distance between each beat for example calculating
the mean and standard deviation of this distance However the normalized pitch
and timbre information already capture this data Another possibility is detecting
certain patterns of beats which could differentiate the syncopated dubstep or glitch
music beats from the steady pulse of electro-house But once again every beat is
accompanied by a sound with a specific timbre and pitch so this feature would not
add any significantly new information
23 Collecting Data and Preprocessing Selected Fea-
tures
231 Collecting the Data
Upon deciding the features I wanted to use in my research I first needed to collect
all of the electronic songs in the Million Song Dataset The easiest reliable way to
achieve this was to iterate through each song in the database and save the information
for the songs where any of the artist genre tags in artist_mbtags matched with an
electronic music genre While this measure was not fully accurate because it looks at
the genre of the artist not the song specific genre information for each song was not
as easily accessible so this indicator was nearly as good a substitute To generate a
list of the genres that electronic songs would fall under I manually searched through
a subset of the MSD to find all genres that seemed to be releated to electronic music
In the case of genres that were sometimes but not always electronic in nature such
20
as disco or pop I erred on the side of caution and did not include them in the list
of electronic genres In these cases false positives such as primarily rock songs that
happen to have the disco label attached to the artist could inadvertantly be included
in the dataset The final list of genres is as follows
target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo
rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo
rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo
rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
rsquodance and electronicarsquorsquoelectronicrsquo]
232 Pitch Preprocessing
A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-
cally informed manner The study first takes the raw sound data and converts it into
a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest
amount Then it computes the most likely chord by comparing comparing the 4 most
common types of chords in popular music (major minor dominant 7 and minor 7) to
the observed chord The most common chords are represented as ldquotemplate chordsrdquo
and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For
example using the note C as the first index the C major chord is represented as
CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)
For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is
computed over every template chord
ρCTc =12sumi=1
(CT minus CTi)(ci minus c)σCTσc
21
where CT is the mean of the values in the template chord σCT is the standard
deviation of the values in the chord and the operations on c are analogous Note
that the summation is over each individual pitch in the 12 pitch classes The chord
template with the highest value of ρ is selected as the chord for the time frame
After this is performed for each time frame the values are smoothed and then the
change between adjacent chords is observed The reasoning behind this step is that
by measuring the relative distance between chords rather than the chords themselves
all songs can be compared in the same manner even though they may have different
key signatures Finally the study takes the types of chord changes and classifies
them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted
versions of the chord changes that make more sense to a human such as ldquochanges
involving dominant 7th chordsrdquo
In my preliminary implementation of this method on an electronic dance music
corpus I made a few modifications to Mauchrsquos study First I smoothed out time
frames before computing the most probable chords rather than smoothing the most
probable chords I did this to save time and to reduce volatility in the chord
measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference
which contains 935 time frames and lasts 212 seconds 5 time frames is slightly
under 1 second and for preliminary testing appeared to be a good interval for each
time block Second as mentioned in the literature section I did not abstract the
chord changes into H-topics This decision also stemmed from time constraints since
deriving semantic chord meaning from EDM songs would require careful research
into the types of harmonies and sounds common in that genre of music Below I
included a high-level visualization of the pitch metadata found in a sample song
ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change
vector that I could then feed into the Dirichlet Process algorithm
22
Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813
CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513
DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913
FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713
GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213
AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513
13 13 13 13
060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013
13 13 13 13 13 13 13 13 13 13 13 13
13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13
Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13
Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13
Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13
Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13
23
13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13
13
13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13
Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13
24
A final step I took to normalize the chord change data was to divide the numbers by
the length of the song so that each songrsquos number of chord changes was measured per
second
233 Timbre Preprocessing
For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre
uniformly across all songs [8] After collecting all song metadata I took a random
sample of 20 songs from each year starting at 1970 The reason I forced the sampling
to 20 randomly sampled songs from each year and did not take a random sample of
songs from all years at once was to prevent bias towards any type of sounds As seen
in figure 22 there are significantly more songs from 2000-2011 than before 2000 The
mean year is x = 2001052 the median year is 2003 and the standard deviation of the
years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include
a disproportionate amount of more recent songs In order to not miss out on sounds
that may be more prevalent in older songs I required a set number of songs from each
year Next from each randomly selected song I selected 20 random timbre frames
in order to prevent any biases in data collection within each song In total there
were 422020 = 16800 timbre frames collected Next I clustered the timbre frames
using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to
100 and selecting the number of clusters with the lowest Bayes Information Criterion
(BIC) a statistical measure commonly used to calculate the best fitting model The
BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters
and saved the mean values of each of the 12 timbre segments for each cluster formed
In the same way that every song had the same 192 cord changes whose frequencies
could be compared between songs each song now had the same 46 timbre clusters
but different frequencies in each song When reading in the metadata from each song
I calculated the most likely timbre cluster each timbre frame belonged to and kept
25
Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear
a frequency count of all of the possible timbre clusters observed in a song Finally
as with the pitch data I divided all observed counts by the duration of the song in
order to normalize each songrsquos timbre counts
26
Chapter 3
Results
31 Methodology
After the pitch and timbre data was processed I ran the Dirichlet Process on the
data For each song I concatenated the 192-element chord change frequency list
and the 46-element timbre category frequency list giving each song a total of 238
features However there is a problem with this setup The pitch data will inherently
dominate the clustering process since it contains almost 3 times as many features
as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to
give different weights to each feature I considered another possibility to remedy
this discrepancy duplicating the timbre vector a certain number of times and
concatenating that to the feature set of each song While this strategy runs the risk
of corrupting the feature set and turning it into something that does not accurately
represent each song it is important to keep in mind that even without duplicating
the timbre vector the feature set consists of two separate feature sets concatenated
to each other Therefore timbre duplication appears to be a reasonable strategy to
weigh pitch and timbre more evenly
27
After this modification I tweaked a few more parameters before obtaining my
final results Dividing the pitch and timbre frequencies by the duration of the song
normalized every song to frequency per second but it also had the undesired effect
of making the data too small Timbre and pitch frequencies per second were almost
always less than 10 and many times hovered as low as 0002 for nonzero values
Because all of the values were very close to each other using common values of α
in the range of 01 to 1000-2000 was insufficient to push the songs into different
clusters As a result every song fell into the same cluster Increasing the value
of α by several orders of magnitude to well over 10 million fixed the problem but
this solution presented two problems First tuning α to experiment with different
ways to cluster the music would be problematic since I would have to work with
an enormous range of possible values for α Second pushing α to such high values
is not appropriate for the Dirichlet Process Extremely high values of α indicate a
Dirichlet Process that will try to disperse the data into different clusters but a value
of α that high is in principle always assigning each new song to a new cluster On
the other hand varying α between 01 and 1000 for example presents a much wider
range of flexibility when assigning clusters While this may be possible by varying
the values of α an extreme amount with the data as it currently is we are using
the Dirichlet Process in a way it should mathematically not be used Therefore
multiplying all of the data by a constant value so that we can work in the appropriate
range of α is the ideal approach After some experimentation I found that k=10 was
an appropriate scaling factor After initial runs of the Dirichlet Process I found out
that there was a slight issue with some of the earlier songs Since I had only artist
genre tags not specific song tags for each song I chose songs based on whether any
of the tags associated with the artist fell under any electronic music genre including
the generic term rsquoelectronicrsquo There were some bands mostly older ones from the
1960s and 1970s like Electric Light Orchestra which had some electronic music but
28
mostly featured rock funk disco or another genre Given that these artists featured
mostly non-electronic songs I decided to exclude them from my study and generate
a blacklist indicating these music artists While it was infeasible to look through
every single song and determine whether it was electronic or not I was able to look
over the earliest songs in each cluster These songs were the most important to verify
as electronic because early non-electronic songs could end up forming new clusters
and inadvertently create clusters with non-electronic sounds that I was not looking for
The goal of this thesis is to identify different groups in which EM songs are
clustered and identify the most unique artists and genres While the second task is
very simple because it requires looking at the earliest songs in each cluster the first
is difficult to gauge the effectiveness of While I can look at the average chord change
and timbre category frequencies in each category as well as other metadata putting
more semantic interpretations to what the music actually sounds like and determining
whether the music is clustered properly is a very subjective process For this reason
I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and
compared the clustering in each category examining similarities and differences in
the clusters formed in each scenario in the Discussion section For each value α I
set the upper limit of components or clusters allowed to 50 The ranges of α I used
resulted in 9 14 and 19 clusters formed
32 Findings
321 α=005
When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are
the distribution of years of the songs in each cluster (note that the Dirichlet Process
29
does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are
skipped)
30
Figure 31 Song year distributions for α = 005
For each value of α I also calculated the average frequency of each chord change
category and timbre category for each cluster and plotted the results The green
lines correspond to timbre and the blue lines to pitch
31
Figure 32 Timbre and pitch distributions for α = 005
A table of each cluster formed the number of songs in that cluster and descriptions
of pitch timbre and rhythmic qualities characteristic of songs in that cluster are
shown below
32
Cluster Song Count Characteristic Sounds
0 6481 Minimalist industrial space sounds dissonant chords
1 5482 Soft New Age ethereal
2 2405 Defined sounds electronic and non-electronic instru-
ments played in standard rock rhythms
3 360 Very dense and complex synths slightly darker tone
4 4550 Heavily distorted rock and synthesizer
6 2854 Faster paced 80s synth rock acid house
8 798 Aggressive beats dense house music
9 1464 Ambient house trancelike strong beats mysterious
tone
11 1597 Melancholy tones New wave rock in 80s then starting
in 90s downtempo trip-hop nu-metal
Table 31 Song cluster descriptions for α = 005
322 α=01
A total of 14 clusters were formed (16 were formed but 2 clusters contained only
one song each I listened to both of these songs and they did not sound unique so
I discarded them from the clusters) Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
33
34
Figure 33 Song year distributions for α = 01
35
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
Contents
Abstract iii
Acknowledgements iv
List of Tables viii
List of Figures ix
1 Introduction 1
11 Background Information 1
12 Literature Review 3
13 The Dataset 10
2 Mathematical Modeling 12
21 Determining Novelty of Songs 12
22 Feature Selection 14
23 Collecting Data and Preprocessing Selected Features 20
231 Collecting the Data 20
232 Pitch Preprocessing 21
233 Timbre Preprocessing 25
3 Results 27
31 Methodology 27
32 Findings 29
321 α=005 29
vi
322 α=01 33
323 α=02 38
33 Analysis 46
4 Conclusion 53
41 Design Flaws in Experiment 53
42 Future Work 55
43 Closing Remarks 56
A Code 57
A1 Pulling Data from the Million Song Dataset 57
A2 Calculating Most Likely Chords and Timbre Categories 58
A3 Code to Compute Timbre Categories 60
A4 Helper Methods for Calculations 61
Bibliography 68
vii
List of Tables
31 Song cluster descriptions for α = 005 33
32 Song cluster descriptions for α = 01 38
33 Song cluster descriptions for α = 02 45
viii
List of Figures
11 A userrsquos taste profile generated by Spotify 4
12 Data processing pipeline for Mauchrsquos study illustrated with a segment
of Queenrsquos Bohemian Rhapsody 1975 8
21 scikit-learn example of GMM vs DPGMM and tuning of α 15
22 Number of Electronic Music Songs in Million Song Dataset from Each
Year 26
31 Song year distributions for α = 005 31
32 Timbre and pitch distributions for α = 005 32
33 Song year distributions for α = 01 35
34 Timbre and pitch distributions for α = 01 37
35 Song year distributions for α = 02 41
36 Timbre and pitch distributions for α = 02 44
ix
Chapter 1
Introduction
11 Background Information
Electronic Music (EM) is an increasingly popular genre of music with an immense
presence and influence on modern culture Because the genre is new as a whole and
is arguably more loosely structured than other genres - technology has enabled the
creation of a wide range of sounds and easy blending of existing and new sounds alike
- formal analysis especially mathematical analysis on the genre is fairly limited and
has only begun growing in the past few years As a fan of EM I am interested in
exploring how the genre has evolved over time More specifically my goal with this
project was to design some structure or model that could help me identify which EM
artists have contributed the most stylistically to the genre Oftentimes famous EM
artists do not create novel-sounding music but rather popularize an existing style
and the motivation of this study is to understand who has stylistically contributed
the most to the EM scene versus those who have merely popularized aspects of it
As the study progressed the manner in which I constructed my model lent to
a second goal of the thesis imagining new ways in which we can imagine EM genres
1
While there exists an extensive amount of research analyzing music trends from
a non-mathematical (cultural societal artistic) perspective the analysis of EM
from a mathematical perspective and especially with respect to any computationally
measurable trends in the genre is close to nonexistent EM has been analyzed to a
lesser extent than other common genres of music in the academic world most likely
due to existing for a shorter amount of time and being less rooted in prominant
social and cultural events In fact the first published reference work on EM did not
exist until 2012 when Professor Mark J Butler from Northwestern University edited
and published Electronica Dance and Club Music a collection of essays exploring
EM genres and culture [1] Furthermore there are very few comprehensive visual
guides that allow a user to relate every genre to each other and easily observe how
different genres converge and diverge While conducting research the best guide I
found was not a scholarly source but an online guide created by an EM enthusiast
Ishkurrsquos Guide to Electronic Music [2] This guide which includes over 100 specific
genres grouped by more general genres and represents chronological evolutions by
connecting each genre in a flowchart is the most exhaustive analysis of the EM scene
I could find However the guidersquos analysis is very qualitative While each subgenre
contains an explanation on typical rhythm and sounds and includes well-known
songs indicative of the style the guide is created by someone who used historical and
personal knowledge of EM My model which creates music genres by chronologically
ordering songs and then assigning them to clusters is a different approach towards
imagining the entire landscape of EM The results may confirm Ishkurrsquos Guidersquos
findings in which case his guide is given additional merit with mathematical evi-
dence or it may be different suggesting that there may be better ways to group EM
genres One advantage that guides such as Ishkurrsquos and historically-based scholarly
works have over my approach is that those models are history-sensitive and therefore
may group songs in a way that historically makes sense On the other hand my
2
model is history-agnostic and may not realize the historical context of songs when
clustering However I believe that there is still significant merit to my research
Instead of classifying genres of music by early genres that led to them my approach
gives the most credit to the artists and songs that were the most innovative for their
time and perhaps reveal different musical styles that are more similar to each other
than history would otherwise imply This way of thinking of music genres while
unconventional is another way of imagining EM
The practice of quantitatively analyzing music has exploded in the last decade
thanks to technological and algorithmic advances that allow data scientists to con-
structively sift through troves of music and listener information In the literature
review I will focus on two particular organizations that have contributed greatly to
the large-scale mathematical analysis of music Pandora a website that plays songs
similar to a songartistalbum inputted by a user and Echo Nest a music analytics
firm that was acquired by Spotify in 2014 and drives Spotifyrsquos Discover Weekly
feature [3] After evaluating the relevance of these sources to my thesis work I will
then look over the relevant academic research and evaluate what this research can
contribute
12 Literature Review
The analysis of quantitative music generally falls into two categories research con-
ducted by academics and academic organizations for scholarly purposes and research
conducted by companies and primarily targeted for consumers First looking at the
consumer-based research Spotify and Pandora are two of the most prominent based
groups and the two I decided to focus on Spotify is a music streaming service where
users can listen to albums and songs from a wide variety of artists or listen to weekly
3
playlists generated based on the music the user and userrsquos friends have listened to
The weekly playlist called Discover Weekly Playlist is a relatively new feature in
Spotify and is driven by music analysis algorithms created from Echo Nest Using
the Echo Nest code interface Spotify creates a ldquotaste profilerdquo for each user which
assesses attributes such as how often a user branches out to new styles of music how
closely the userrsquos music streamed follows popular Billboard music charts and so on
Spotify also looks at the artists and songs the user streamed and creates clusters
of different genres that the user likes (see figure 11) The taste profile and music
clusters can then be used to generate playlists geared to a specific user The genres
in the cluster come from a list of nearly 800 names which are derived by scraping
the Internet for trending terms in music as well as training various algorithms on a
regular basic by ldquolisteningrdquo to new songs [4][5]
Figure 11 A userrsquos taste profile generated by Spotify
4
Although Spotify and Echo Nestrsquos algorithms are very useful for mapping the land-
scape of established and emerging genres of music the methodology is limited to
pre-defined genres of music This may serve as a good starting point to compare my
final results to but my study aims to be as context-free as possible by attaching no
preconceived notions of music styles or genres instead looking at features that could
be measured in every song
While Spotifyrsquos approach to mapping music is very high-tech and based on ex-
isting genres Pandora takes a very low-tech and context-free approach to music
clustering Pandora created the Music Genome Project a multi-year undertaking
where skilled music theorists listened to a large number of songs and analyzed up to
450 characteristics in each song [6] Pandorarsquos approach is appealing to the aim of
my study since it does not take any preconceived notions of what a genre of music
is instead comparing songs on common characteristics such as pitch rhythm and
instrument patterns Unfortunately I do not have a cadre of skilled music theorists
at my disposal nor do I have 10 years to perform such calculations like the dedicated
workers at Pandora (tips the indestructible fedora) Additionally Pandorarsquos Music
Genome Project is intellectual property so at best I can only rely on the abstract
concepts of the Music Genome Project to drive my study
In the academic realm there are no existing studies analyzing quantifiable changes in
EM specifically but there exist a few studies that perform such analysis on popular
Western music in general One such study is Measuring the Evolution of Contem-
porary Western Popular Music which analyzes music from 1955-2010 spanning all
common genres Using the Million Song Dataset a free public database of songs
each containing metadata (see section 13) the study focuses on the attributes pitch
timbre and loudness Pitch is defined as the standard musical notes or frequency of
5
the sound waves Timbre is formally defined as the Mel frequency cepstral coefficients
(MFCC) of a transformed sound signal More informally it refers to the sound color
texture or tone quality and is associated with instrument types recording resources
and production techniques In other words two sounds that have the same pitch
but different tones (for example a bell and voice) are differentiated by their timbres
There are 12 MFCCs that define the timbre of a given sound Finally loudness
refers to intrinsically how loud the music sounds not loudness that a listener can
manipulate while listening to the music Loudness is the first MFCC of the timbre
of a sound [7] The study concluded that over time music has been becoming louder
and less diverse
The restriction of pitch sequences (with metrics showing less variety inpitch progressions) the homogenization of the timbral palette (with fre-quent timbres becoming more frequent) and growing average loudnesslevels (threatening a dynamic richness that has been conserved until to-day) This suggests that our perception of the new would be essentiallyrooted on identifying simpler pitch sequences fashionable timbral mix-tures and louder volumes Hence an old tune with slightly simpler chordprogressions new instrument sonorities that were in agreement with cur-rent tendencies and recorded with modern techniques that allowed forincreased loudness levels could be easily perceived as novel fashionableand groundbreaking
This study serves as a good starting point for mathematically analyzing music in
a few ways First it utilizes the Million Song Dataset which addresses the issue
of legally obtaining music metadata As mentioned in section 13 the only legal
way to obtain playable music for this study would have been to purchase all songs I
would include which is infeasible While the Million Song Dataset does not contain
the audio files in playable format it does contain audio features and metadata that
allow for in-depth analysis In addition working with the dataset takes out the
work of extracting features from raw audio files saving an extensive amount of time
and energy Second the study establishes specifics for what constitutes a trend
in music Pitch timbre and loudness are core features of music and examining the6
distributions of each among songs over time reveals a lot of information about how
the music industry and consumersrsquo tastes have evolved While these are not all of the
features contained in a song they serve as a good starting point Third the study
defines mathematical ways to capture music attributes and measure their change
over time For example pitches are transposed into the same tonal context with
binary discretized pitch descriptions based on a threshold so that each song can be
represented with vectors of pitches that are normalized and compared to other songs
While this study lays some solid groundwork for capturing and analyzing nu-
meric qualities of music it falls short of addressing my goals in a couple of ways
First it does not perform any analysis with respect to music genre While the
analysis performed in this paper could easily be applied to a list of songs in a specific
genre certain genres might have unique sounds and rhythms relative to other genres
that would be worth studying in greater detail Second the study only measures
general trends in music over time The models used to describe changes are simple
regressions that donrsquot look at more nuanced changes For example what styles of
music developed over certain periods of time How rapid were those changes Which
styles of music developed from which other styles
A more promising study led by music researcher Matthias Mauch [8] analyzes
contemporary popular Western Music from the 1960s to 2010s by comparing numer-
ical data on the pitches and timbre of a corpus of 17000 songs that appeared on the
Billboard Hot 100 Like the previously mentioned paper Measuring the Evolution
of Contemporary Western Popular Music Mauchrsquos study also creates abstractions
of pitch and timbre in order to provide a consistent and meaningful semantic inter-
pretation of musical data (see figure 12) However Mauchrsquos study takes this idea a
step further by using genre tags from Lastfm a music website and constructing a
7
hierarchy of music genres using hierarchical clustering Additionally the study takes
a crack at determining whether a particular band the Beatles was musically ground-
breaking for its time or merely playing off sounds that other bands had already used
Figure 12 Data processing pipeline for Mauchrsquos study illustrated with a segment ofQueenrsquos Bohemian Rhapsody 1975
While both Measuring the Evolution of Contemporary Western Popular Music
and Mauchrsquos study created abstractions of pitch and timbre Mauchrsquos study is more
appealing with respect to my goal because its end results align more closely with
mine Additionally the data processing pipeline offers several layers of abstraction
8
and depending on my progress I would be able to achieve at least one of the levels of
abstraction As shown in figure 12 each segment of a raw audio file is first broken
down into its 12 timbre MFCCs and pitch components Next the study constructs
ldquolexiconsrdquo or a dictionary of pitch and timbre terms that all songs can be compared
to For pitch the original data is in a N-by-12 matrix where N is the number of time
segments in the song and 12 the number of each of the notes found in an octave of
pitches Each time segment contains the relative strengths of each of the 12 pitches
However music sounds are not merely a collection of pitches but more precisely
chords Furthermore the similarity of two songs is not determined by the absolute
pitches of their chords but rather the progression of chords in the song all relative to
each other For example if all the notes in a song are transposed by one step the song
will sound different in terms of absolute pitch but the song will still be recognized
as the original because all of the relative movements from each chord to the next
are the same This phenomenon is captured in the pitch data by finding the most
likely chord played at each time segment then counting the change to the next chord
at each time step and generating a table of chord change frequencies for each song
Constructing the timbre lexcion is more complicated since there is no easy analogue
like chords for pitches to compare songs Mauchrsquos study utilizes a Gaussian Mixture
Model (GMM) by iterating over k=1 to k=N clusters where N is a large number
running the GMM on each prior assumption of k clusters and computing the Bayes
Information Criterion (BIC) for each model The lowest of the N BIC values is found
and that value of k is selected That model contains k different timbre clusters
and each cluster contains the mean timbre value for each of the 12 timbre components
For my research I decided that the pitch and timbre lexicons would be the most
realistic level of abstraction I could obtain Mauchrsquos study adds an addtional layer
to pitch and timbre by identifying the most common patterns of chord changes and
9
most common timbre rhythms and creating more general tags from these combined
terms such as ldquo stepwise changes indicating modal harmonyrdquo for a pitch topic and
ldquooh rounded mellowrdquo for a timbral topic There were two problems with using this
final layer of abstraction for my study First attaching semantic interpretations to
the pitch and timbral lexicons is a difficult task For timbre I would need to listen
to sound samples containing all of the different timbral categories I identified and
attaching user interpretations to them For the chords not only would I have to
perform the same analysis as on timbre but take careful attention to identify which
chords correspond to common sound progressions in popular music a task that I am
not qualified for an did not have the resources for this thesis to seek out Second
this final layer of abstraction was not necessary for the end goal of my paper In
fact consolidating my pitch and timbre lexicons into simpler phrases would run the
risk of pigeonholing my analysis and preventing me from discovering more nuanced
patterns in my final results Therefore I decided to focus on pitch and timbral
lexicon construction as the furthest levels of abstraction when processing songs for
my thesis Mathematical details on how I constructed the lexical and timbral lexicons
can be found in the Mathematical Modeling section of this paper
13 The Dataset
In order to successfully execute my thesis I need access to an extensive database of
music Until recently acquiring a substantial corpus of music data was a difficult and
costly task It is illegal to download music audio files from video and music-sharing
sites such as YouTube Spotify and Pandora Some platforms such as iTunes offer
90-second previews of songs but using only segments of songs and usually segments
that showcase the chorus of the song are not reliable measures to capture the entire
essence of a song Even if I were to legally download entire audio files for free I would
10
run into additional issues Obtaining a high-quality corpora of song data would be
challenging writing scripts that crawl music sharing platforms may not capture all of
the music I am looking for And once I have the audio files I would have to perform
audio processing techniques to extract the relevant information from the songs
Fortunately there is an easy solution to the music data acquisition problem
The Million Song Dataset (MSD) is a collection of metadata for one million music
tracks dating up to 2011 Various organizations such as The Echo Nest Musicbrainz
7digital and Lastfm have contributed different pieces of metadata Each song is
represented as a Hierarchical Data Format file (HDF5) which can be loaded as a
JSON object The fields encompass topical features such as the song title artist
and release date as well as lower-level features such as the loudness starting beat
time pitches and timbre of several segments of the song [9] While the MSD is
the largest free and open source music metadata dataset I could find there is no
guarantee that it adequately covers the entire spectrum of EM artists and songs
This quality limitation is important to consider throughout the study A quick look
through the songs including the subset of data I worked with for this report showed
that there were several well-known artists and songs in the EM scene Therefore
while the MSD may not contain all desired songs for this project it contains an
adequate number of relevant songs to produce some meaningful results Additionally
laying the groundwork for modeling the similarities between songs and identifying
groundbreaking ones is the same regardless of the songs included and the following
methodologies can be implemented on any similarly-formatted dataset including one
with songs that might currently be missing in the MSD
11
Chapter 2
Mathematical Modeling
21 Determining Novelty of Songs
Finding an logical and implementable mathematical model was and continues to be
an important aspect of my research My problem how to mathematically determine
which songs were unique for their time requires an algorithm in which each song is
introduced in chronological order either joining an existing category or starting a
new category based on its musical similarity to songs already introduced Clustering
algorithms like k-means or Gaussian Mixture Models (GMM) which have a prede-
termined number of clusters and optimize the partitioning of a dataset into those
cluster assume a fixed number of clusters While this process would work if we knew
exactly how many genres of EM existed if we guess wrong our end results may end
up with clusters that are wrongly grouped together or separated It is much better to
apply a clustering algorithm that does not make any assumptions about this number
One particularly promising process that addresses the issue of tthe number of
clusters is a family of algorithms known as Dirichlet Proccesses (DPs) DPs are
useful for this particular application because (1) they assign clusters to a dataset
12
with only an upper bound on the number of clusters and (2) by sorting the songs
in chronological order before running the algorithm and keeping track of which
songs are categorized under each cluster we can observe the earliest songs in each
cluster and consequentially infer which songs were responsible for creating new
clusters The arguments for the DP The DP is controlled by a parameter α which
is the concentration parameter The expected number of clusters formed is directly
proportional to the value of α so the higher the value of α the more likely new
clusters will be formed [10] Regardless of the value of α as the number of data
points introduced increases the probability of a new group being formed decreases
That is a ldquorich get richerrdquo policy is in place and existing clusters tend to grow in
size Tweaking the value of the tunable parameter α is an important part of the
study since it determines the flexibility given to forming a new cluster If the value
of α is too small then the criteria for forming clusters will be too strict and data
that should be in different clusters will be assigned to the same cluster On the other
hand if α is too large the algorithm will be too sensitive and assign similar songs to
different clusters
The implementation of the DP was achieved using scikit-learnrsquos library and API for
Dirichlet Process Gaussian Mixture Model (DPGMM) The DPGMM is the formal
name of the Dirichlet Process model used to cluster the data More specifically
scikit-learnrsquos implementation of the DPGMM uses the Stick Breaking method
one of several equally valid methods to assign songs to clusters [11] While the
mathematical details for this algorithm can be found at the following citation [12]
the most important aspects of the DPGMM are the arguments that the user can
specify and tune The first of these tunable parameters is the value α which is the
same parameter as the α discussed in the previous paragraph As seen in Figure 21
on the right side properly tuning α is key to obtaining meaningful clusters The
13
center image has α set to 001 which is too small and results in all of the data being
formed under one cluster On the other hand the bottom-right image has the same
data set and α set to 100 which does a better job of clustering On a related note
the figure also demonstrates the effectiveness of the DPGMM over the GMM On the
left side clearly the dataset contains 2 clusters but the GMM on the top-left image
assumes 5 clusters as a prior and consequentially clusters the data incorrectly while
the DPGMM manages to limit the data to 2 clusters
The second argument that the user inputs for the DPGMM is the data that
will be clustered The scikit-learn implementation takes the data in the format
of a nested list (N lists each of length m) where N is the number of data points
and m the number of features While the format of the data structure is relatively
straightforward choosing which numbers should be in the data was a challenge I
faced Selecting the relevant features of each song to be used in the algorithm will
be expounded upon in the next section ldquoFeature Selectionrdquo
The last argument that a user inputs for the scikit-learn DGPMM implementa-
tion is an argument indicating the upper bound for the number of clusters The
Dirichlet Process then determines the best number of clusters for the data between
1 and the upper bound Since the DPGMM is flexible enough to find the best value
I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to
modify the number of clusters formed
22 Feature Selection
One of the most difficult aspects of the Dirichlet method is choosing the features
to be used for clustering In other words when we organize the songs into clusters
14
Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α
we need to ensure that each cluster is distinct in a way that is statistically and
intuitively logical In the Million Song Dataset [9] each song is represented as a
JSON object containing several fields These fields are candidate features to be used
in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo
by Rick Astley and the corresponding features
artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID
for this artists is db9)
artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)
artist_mbtags_count shape = (4)
(raw tag count of the 4 tags this artist received on musicbrainzorg)
artist_name Rick Astley (artist name)
artist_playmeid 1338 (the ID of that artist on the service playmecom)
artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)
artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest
(number between 0 and 1))
artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest
(number between 0 and 1))
15
audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for
the analysis by The Echo Nest)
bars_confidence shape = (99) (confidence value (between 0 and 1) associated
with each bar by The Echo Nest)
bars_start shape = (99) (start time of each bar according to The Echo Nest this
song has 99 bars)
beats_confidence shape = (397))
confidence value (between 0 and 1) associated with each beat by The Echo Nest
beats_start shape = (397) (start time of each beat according to The Echo Nest
this song has 397 beats)
danceability 00 (danceability measure of this song according to The Echo Nest
(between 0 and 1 0 =gt not analyzed))
duration 21169587 (duration of the track in seconds)
end_of_fade_in 0139 (time of the end of the fade in at the beginning of the
song according to The Echo Nest)
energy 00 (energy measure (not in the signal processing sense) according to The
Echo Nest (between 0 and 1 0 = not analyzed))
key 1 (estimation of the key the song is in by The Echo Nest)
key_confidence 0324 (confidence of the key estimation)
loudness -775 (general loudness of the track)
mode 1 (estimation of the mode the song is in by The Echo Nest)
mode_confidence 0434 (confidence of the mode estimation)
release Big Tunes - Back 2 The 80s (album name from which the track was taken
some songs tracks can come from many albums we give only one)
release_7digitalid 786795 (the ID of the release (album) on the service 7digi-
talcom)
sections_confidence shape = (10) (confidence value (between 0 and 1) associated
16
with each section by The Echo Nest)
sections_start shape = (10) (start time of each section according to The Echo
Nest this song has 10 sections)
segments_confidence shape = (935) (confidence value (between 0 and 1) asso-
ciated with each segment by The Echo Nest)
segments_loudness_max shape = (935) (max loudness during each segment)
segments_loudness_max_time shape = (935) (time of the max loudness
during each segment)
segments_loudness_start shape = (935) (loudness at the beginning of each
segment)
segments_pitches shape = (935 12) (chroma features for each segment (normal-
ized so max is 1))
segments_start shape = (935) (start time of each segment ( musical event or
onset) according to The Echo Nest this song has 935 segments)
segments_timbre shape = (935 12) (MFCC-like features for each segment)
similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar
to Rick Astley according to The Echo Nest)
song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded
(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))
song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can
be associated with many tracks (with very slight audio differences))
start_of _fade _out 198536 (start time of the fade out in seconds at the end
of the song according to The Echo Nest)
tatums_confidence shape = (794) (confidence value (between 0 and 1) associated
with each tatum by The Echo Nest)
tatums_start shape = (794) (start time of each tatum according to The Echo
Nest this song has 794 tatums)
17
tempo 113359 (tempo in BPM according to The Echo Nest)
time_signature 4 (time signature of the song according to The Echo Nest ie
usual number of beats per bar)
time_signature_confidence 0634 (confidence of the time signature estimation)
title Never Gonna Give You Up (song title)
track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)
track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track
on which the analysis was done) year 1987 (year when this song was released
according to musicbrainzorg)
When choosing features my main goal was to use features that would most
likely yield meaningful results yet also be simple and make sense to the average
person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener
will have his or her opinions to what constitutes different types of music but some
common features most people tend to differentiate songs by are pitch rhythm and
the types of instruments used The following specific fields provided in each song
object fall under these three terms
Pitch
bull segments_pitches a matrix of values indicating the strength of each pitch (or
note) at each discernible time interval
Rhythm
bull beats_start a vector of values indicating the start time of each beat
bull time_signature the time signature of the song
bull tempo the speed of the song in Beats Per Minute (BPM)
Instruments
18
bull segments_timbre a matrix of values indicating the distribution of MFCC-like
features (different types of tones) for each segments
The segments_pitches feature is a clear candidate for a differentiating factor for
songs since it reveals patterns of notes that occur Additionally other research
papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a
procedure that allows all songs to be compared with the same metric Likewise
timbre is intuitively a reliable differentiating feature since it reveals the amount
that different tones or sounds that sound different despite having the same pitch
Therefore segments_timbre is another feature that is considered in each song
Finally we look at the candidate features for rhythm At first glance all of these
features appear to be useful as they indicate the rhythm of a song in one way or
another However none of these features are as useful as the pitch and timbre
features While tempo is one factor in differentiating genres of EDM and music in
general tempo alone is not a driving force of musical innovation Certain genres
of EDM like drum nrsquo bass and happycore stand out for having very fast tempos
but the tempo is supplemented with a sound unique to the genre Conceiving new
arrangements of pitches combining instruments in new ways and inventing new
types of sounds are novel but speeding up or slowing down existing sounds is not
Including tempo as a feature could actually add noise to the model since many genres
overlap in their tempos And finally tempo is measured indirectly when the pitch
and timbre features are normalized for each song everything is measured in units of
ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features
each second Time signature can be dismissed from the candidates features for the
same reason as tempo many genres contain the same time signature and including
it in the feature set would only add more noise beats_start looks like a more
promising feature since like segments_pitches and segments_timbre it consists of
a vector of values However difficulties arise when we begin to think how exactly
19
we can utilize this information Since each song varies in length we need a way to
compare songs of different durations on the same level One approach could be to
perform basic statistics on the distance between each beat for example calculating
the mean and standard deviation of this distance However the normalized pitch
and timbre information already capture this data Another possibility is detecting
certain patterns of beats which could differentiate the syncopated dubstep or glitch
music beats from the steady pulse of electro-house But once again every beat is
accompanied by a sound with a specific timbre and pitch so this feature would not
add any significantly new information
23 Collecting Data and Preprocessing Selected Fea-
tures
231 Collecting the Data
Upon deciding the features I wanted to use in my research I first needed to collect
all of the electronic songs in the Million Song Dataset The easiest reliable way to
achieve this was to iterate through each song in the database and save the information
for the songs where any of the artist genre tags in artist_mbtags matched with an
electronic music genre While this measure was not fully accurate because it looks at
the genre of the artist not the song specific genre information for each song was not
as easily accessible so this indicator was nearly as good a substitute To generate a
list of the genres that electronic songs would fall under I manually searched through
a subset of the MSD to find all genres that seemed to be releated to electronic music
In the case of genres that were sometimes but not always electronic in nature such
20
as disco or pop I erred on the side of caution and did not include them in the list
of electronic genres In these cases false positives such as primarily rock songs that
happen to have the disco label attached to the artist could inadvertantly be included
in the dataset The final list of genres is as follows
target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo
rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo
rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo
rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
rsquodance and electronicarsquorsquoelectronicrsquo]
232 Pitch Preprocessing
A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-
cally informed manner The study first takes the raw sound data and converts it into
a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest
amount Then it computes the most likely chord by comparing comparing the 4 most
common types of chords in popular music (major minor dominant 7 and minor 7) to
the observed chord The most common chords are represented as ldquotemplate chordsrdquo
and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For
example using the note C as the first index the C major chord is represented as
CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)
For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is
computed over every template chord
ρCTc =12sumi=1
(CT minus CTi)(ci minus c)σCTσc
21
where CT is the mean of the values in the template chord σCT is the standard
deviation of the values in the chord and the operations on c are analogous Note
that the summation is over each individual pitch in the 12 pitch classes The chord
template with the highest value of ρ is selected as the chord for the time frame
After this is performed for each time frame the values are smoothed and then the
change between adjacent chords is observed The reasoning behind this step is that
by measuring the relative distance between chords rather than the chords themselves
all songs can be compared in the same manner even though they may have different
key signatures Finally the study takes the types of chord changes and classifies
them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted
versions of the chord changes that make more sense to a human such as ldquochanges
involving dominant 7th chordsrdquo
In my preliminary implementation of this method on an electronic dance music
corpus I made a few modifications to Mauchrsquos study First I smoothed out time
frames before computing the most probable chords rather than smoothing the most
probable chords I did this to save time and to reduce volatility in the chord
measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference
which contains 935 time frames and lasts 212 seconds 5 time frames is slightly
under 1 second and for preliminary testing appeared to be a good interval for each
time block Second as mentioned in the literature section I did not abstract the
chord changes into H-topics This decision also stemmed from time constraints since
deriving semantic chord meaning from EDM songs would require careful research
into the types of harmonies and sounds common in that genre of music Below I
included a high-level visualization of the pitch metadata found in a sample song
ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change
vector that I could then feed into the Dirichlet Process algorithm
22
Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813
CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513
DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913
FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713
GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213
AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513
13 13 13 13
060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013
13 13 13 13 13 13 13 13 13 13 13 13
13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13
Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13
Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13
Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13
Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13
23
13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13
13
13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13
Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13
24
A final step I took to normalize the chord change data was to divide the numbers by
the length of the song so that each songrsquos number of chord changes was measured per
second
233 Timbre Preprocessing
For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre
uniformly across all songs [8] After collecting all song metadata I took a random
sample of 20 songs from each year starting at 1970 The reason I forced the sampling
to 20 randomly sampled songs from each year and did not take a random sample of
songs from all years at once was to prevent bias towards any type of sounds As seen
in figure 22 there are significantly more songs from 2000-2011 than before 2000 The
mean year is x = 2001052 the median year is 2003 and the standard deviation of the
years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include
a disproportionate amount of more recent songs In order to not miss out on sounds
that may be more prevalent in older songs I required a set number of songs from each
year Next from each randomly selected song I selected 20 random timbre frames
in order to prevent any biases in data collection within each song In total there
were 422020 = 16800 timbre frames collected Next I clustered the timbre frames
using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to
100 and selecting the number of clusters with the lowest Bayes Information Criterion
(BIC) a statistical measure commonly used to calculate the best fitting model The
BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters
and saved the mean values of each of the 12 timbre segments for each cluster formed
In the same way that every song had the same 192 cord changes whose frequencies
could be compared between songs each song now had the same 46 timbre clusters
but different frequencies in each song When reading in the metadata from each song
I calculated the most likely timbre cluster each timbre frame belonged to and kept
25
Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear
a frequency count of all of the possible timbre clusters observed in a song Finally
as with the pitch data I divided all observed counts by the duration of the song in
order to normalize each songrsquos timbre counts
26
Chapter 3
Results
31 Methodology
After the pitch and timbre data was processed I ran the Dirichlet Process on the
data For each song I concatenated the 192-element chord change frequency list
and the 46-element timbre category frequency list giving each song a total of 238
features However there is a problem with this setup The pitch data will inherently
dominate the clustering process since it contains almost 3 times as many features
as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to
give different weights to each feature I considered another possibility to remedy
this discrepancy duplicating the timbre vector a certain number of times and
concatenating that to the feature set of each song While this strategy runs the risk
of corrupting the feature set and turning it into something that does not accurately
represent each song it is important to keep in mind that even without duplicating
the timbre vector the feature set consists of two separate feature sets concatenated
to each other Therefore timbre duplication appears to be a reasonable strategy to
weigh pitch and timbre more evenly
27
After this modification I tweaked a few more parameters before obtaining my
final results Dividing the pitch and timbre frequencies by the duration of the song
normalized every song to frequency per second but it also had the undesired effect
of making the data too small Timbre and pitch frequencies per second were almost
always less than 10 and many times hovered as low as 0002 for nonzero values
Because all of the values were very close to each other using common values of α
in the range of 01 to 1000-2000 was insufficient to push the songs into different
clusters As a result every song fell into the same cluster Increasing the value
of α by several orders of magnitude to well over 10 million fixed the problem but
this solution presented two problems First tuning α to experiment with different
ways to cluster the music would be problematic since I would have to work with
an enormous range of possible values for α Second pushing α to such high values
is not appropriate for the Dirichlet Process Extremely high values of α indicate a
Dirichlet Process that will try to disperse the data into different clusters but a value
of α that high is in principle always assigning each new song to a new cluster On
the other hand varying α between 01 and 1000 for example presents a much wider
range of flexibility when assigning clusters While this may be possible by varying
the values of α an extreme amount with the data as it currently is we are using
the Dirichlet Process in a way it should mathematically not be used Therefore
multiplying all of the data by a constant value so that we can work in the appropriate
range of α is the ideal approach After some experimentation I found that k=10 was
an appropriate scaling factor After initial runs of the Dirichlet Process I found out
that there was a slight issue with some of the earlier songs Since I had only artist
genre tags not specific song tags for each song I chose songs based on whether any
of the tags associated with the artist fell under any electronic music genre including
the generic term rsquoelectronicrsquo There were some bands mostly older ones from the
1960s and 1970s like Electric Light Orchestra which had some electronic music but
28
mostly featured rock funk disco or another genre Given that these artists featured
mostly non-electronic songs I decided to exclude them from my study and generate
a blacklist indicating these music artists While it was infeasible to look through
every single song and determine whether it was electronic or not I was able to look
over the earliest songs in each cluster These songs were the most important to verify
as electronic because early non-electronic songs could end up forming new clusters
and inadvertently create clusters with non-electronic sounds that I was not looking for
The goal of this thesis is to identify different groups in which EM songs are
clustered and identify the most unique artists and genres While the second task is
very simple because it requires looking at the earliest songs in each cluster the first
is difficult to gauge the effectiveness of While I can look at the average chord change
and timbre category frequencies in each category as well as other metadata putting
more semantic interpretations to what the music actually sounds like and determining
whether the music is clustered properly is a very subjective process For this reason
I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and
compared the clustering in each category examining similarities and differences in
the clusters formed in each scenario in the Discussion section For each value α I
set the upper limit of components or clusters allowed to 50 The ranges of α I used
resulted in 9 14 and 19 clusters formed
32 Findings
321 α=005
When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are
the distribution of years of the songs in each cluster (note that the Dirichlet Process
29
does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are
skipped)
30
Figure 31 Song year distributions for α = 005
For each value of α I also calculated the average frequency of each chord change
category and timbre category for each cluster and plotted the results The green
lines correspond to timbre and the blue lines to pitch
31
Figure 32 Timbre and pitch distributions for α = 005
A table of each cluster formed the number of songs in that cluster and descriptions
of pitch timbre and rhythmic qualities characteristic of songs in that cluster are
shown below
32
Cluster Song Count Characteristic Sounds
0 6481 Minimalist industrial space sounds dissonant chords
1 5482 Soft New Age ethereal
2 2405 Defined sounds electronic and non-electronic instru-
ments played in standard rock rhythms
3 360 Very dense and complex synths slightly darker tone
4 4550 Heavily distorted rock and synthesizer
6 2854 Faster paced 80s synth rock acid house
8 798 Aggressive beats dense house music
9 1464 Ambient house trancelike strong beats mysterious
tone
11 1597 Melancholy tones New wave rock in 80s then starting
in 90s downtempo trip-hop nu-metal
Table 31 Song cluster descriptions for α = 005
322 α=01
A total of 14 clusters were formed (16 were formed but 2 clusters contained only
one song each I listened to both of these songs and they did not sound unique so
I discarded them from the clusters) Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
33
34
Figure 33 Song year distributions for α = 01
35
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
322 α=01 33
323 α=02 38
33 Analysis 46
4 Conclusion 53
41 Design Flaws in Experiment 53
42 Future Work 55
43 Closing Remarks 56
A Code 57
A1 Pulling Data from the Million Song Dataset 57
A2 Calculating Most Likely Chords and Timbre Categories 58
A3 Code to Compute Timbre Categories 60
A4 Helper Methods for Calculations 61
Bibliography 68
vii
List of Tables
31 Song cluster descriptions for α = 005 33
32 Song cluster descriptions for α = 01 38
33 Song cluster descriptions for α = 02 45
viii
List of Figures
11 A userrsquos taste profile generated by Spotify 4
12 Data processing pipeline for Mauchrsquos study illustrated with a segment
of Queenrsquos Bohemian Rhapsody 1975 8
21 scikit-learn example of GMM vs DPGMM and tuning of α 15
22 Number of Electronic Music Songs in Million Song Dataset from Each
Year 26
31 Song year distributions for α = 005 31
32 Timbre and pitch distributions for α = 005 32
33 Song year distributions for α = 01 35
34 Timbre and pitch distributions for α = 01 37
35 Song year distributions for α = 02 41
36 Timbre and pitch distributions for α = 02 44
ix
Chapter 1
Introduction
11 Background Information
Electronic Music (EM) is an increasingly popular genre of music with an immense
presence and influence on modern culture Because the genre is new as a whole and
is arguably more loosely structured than other genres - technology has enabled the
creation of a wide range of sounds and easy blending of existing and new sounds alike
- formal analysis especially mathematical analysis on the genre is fairly limited and
has only begun growing in the past few years As a fan of EM I am interested in
exploring how the genre has evolved over time More specifically my goal with this
project was to design some structure or model that could help me identify which EM
artists have contributed the most stylistically to the genre Oftentimes famous EM
artists do not create novel-sounding music but rather popularize an existing style
and the motivation of this study is to understand who has stylistically contributed
the most to the EM scene versus those who have merely popularized aspects of it
As the study progressed the manner in which I constructed my model lent to
a second goal of the thesis imagining new ways in which we can imagine EM genres
1
While there exists an extensive amount of research analyzing music trends from
a non-mathematical (cultural societal artistic) perspective the analysis of EM
from a mathematical perspective and especially with respect to any computationally
measurable trends in the genre is close to nonexistent EM has been analyzed to a
lesser extent than other common genres of music in the academic world most likely
due to existing for a shorter amount of time and being less rooted in prominant
social and cultural events In fact the first published reference work on EM did not
exist until 2012 when Professor Mark J Butler from Northwestern University edited
and published Electronica Dance and Club Music a collection of essays exploring
EM genres and culture [1] Furthermore there are very few comprehensive visual
guides that allow a user to relate every genre to each other and easily observe how
different genres converge and diverge While conducting research the best guide I
found was not a scholarly source but an online guide created by an EM enthusiast
Ishkurrsquos Guide to Electronic Music [2] This guide which includes over 100 specific
genres grouped by more general genres and represents chronological evolutions by
connecting each genre in a flowchart is the most exhaustive analysis of the EM scene
I could find However the guidersquos analysis is very qualitative While each subgenre
contains an explanation on typical rhythm and sounds and includes well-known
songs indicative of the style the guide is created by someone who used historical and
personal knowledge of EM My model which creates music genres by chronologically
ordering songs and then assigning them to clusters is a different approach towards
imagining the entire landscape of EM The results may confirm Ishkurrsquos Guidersquos
findings in which case his guide is given additional merit with mathematical evi-
dence or it may be different suggesting that there may be better ways to group EM
genres One advantage that guides such as Ishkurrsquos and historically-based scholarly
works have over my approach is that those models are history-sensitive and therefore
may group songs in a way that historically makes sense On the other hand my
2
model is history-agnostic and may not realize the historical context of songs when
clustering However I believe that there is still significant merit to my research
Instead of classifying genres of music by early genres that led to them my approach
gives the most credit to the artists and songs that were the most innovative for their
time and perhaps reveal different musical styles that are more similar to each other
than history would otherwise imply This way of thinking of music genres while
unconventional is another way of imagining EM
The practice of quantitatively analyzing music has exploded in the last decade
thanks to technological and algorithmic advances that allow data scientists to con-
structively sift through troves of music and listener information In the literature
review I will focus on two particular organizations that have contributed greatly to
the large-scale mathematical analysis of music Pandora a website that plays songs
similar to a songartistalbum inputted by a user and Echo Nest a music analytics
firm that was acquired by Spotify in 2014 and drives Spotifyrsquos Discover Weekly
feature [3] After evaluating the relevance of these sources to my thesis work I will
then look over the relevant academic research and evaluate what this research can
contribute
12 Literature Review
The analysis of quantitative music generally falls into two categories research con-
ducted by academics and academic organizations for scholarly purposes and research
conducted by companies and primarily targeted for consumers First looking at the
consumer-based research Spotify and Pandora are two of the most prominent based
groups and the two I decided to focus on Spotify is a music streaming service where
users can listen to albums and songs from a wide variety of artists or listen to weekly
3
playlists generated based on the music the user and userrsquos friends have listened to
The weekly playlist called Discover Weekly Playlist is a relatively new feature in
Spotify and is driven by music analysis algorithms created from Echo Nest Using
the Echo Nest code interface Spotify creates a ldquotaste profilerdquo for each user which
assesses attributes such as how often a user branches out to new styles of music how
closely the userrsquos music streamed follows popular Billboard music charts and so on
Spotify also looks at the artists and songs the user streamed and creates clusters
of different genres that the user likes (see figure 11) The taste profile and music
clusters can then be used to generate playlists geared to a specific user The genres
in the cluster come from a list of nearly 800 names which are derived by scraping
the Internet for trending terms in music as well as training various algorithms on a
regular basic by ldquolisteningrdquo to new songs [4][5]
Figure 11 A userrsquos taste profile generated by Spotify
4
Although Spotify and Echo Nestrsquos algorithms are very useful for mapping the land-
scape of established and emerging genres of music the methodology is limited to
pre-defined genres of music This may serve as a good starting point to compare my
final results to but my study aims to be as context-free as possible by attaching no
preconceived notions of music styles or genres instead looking at features that could
be measured in every song
While Spotifyrsquos approach to mapping music is very high-tech and based on ex-
isting genres Pandora takes a very low-tech and context-free approach to music
clustering Pandora created the Music Genome Project a multi-year undertaking
where skilled music theorists listened to a large number of songs and analyzed up to
450 characteristics in each song [6] Pandorarsquos approach is appealing to the aim of
my study since it does not take any preconceived notions of what a genre of music
is instead comparing songs on common characteristics such as pitch rhythm and
instrument patterns Unfortunately I do not have a cadre of skilled music theorists
at my disposal nor do I have 10 years to perform such calculations like the dedicated
workers at Pandora (tips the indestructible fedora) Additionally Pandorarsquos Music
Genome Project is intellectual property so at best I can only rely on the abstract
concepts of the Music Genome Project to drive my study
In the academic realm there are no existing studies analyzing quantifiable changes in
EM specifically but there exist a few studies that perform such analysis on popular
Western music in general One such study is Measuring the Evolution of Contem-
porary Western Popular Music which analyzes music from 1955-2010 spanning all
common genres Using the Million Song Dataset a free public database of songs
each containing metadata (see section 13) the study focuses on the attributes pitch
timbre and loudness Pitch is defined as the standard musical notes or frequency of
5
the sound waves Timbre is formally defined as the Mel frequency cepstral coefficients
(MFCC) of a transformed sound signal More informally it refers to the sound color
texture or tone quality and is associated with instrument types recording resources
and production techniques In other words two sounds that have the same pitch
but different tones (for example a bell and voice) are differentiated by their timbres
There are 12 MFCCs that define the timbre of a given sound Finally loudness
refers to intrinsically how loud the music sounds not loudness that a listener can
manipulate while listening to the music Loudness is the first MFCC of the timbre
of a sound [7] The study concluded that over time music has been becoming louder
and less diverse
The restriction of pitch sequences (with metrics showing less variety inpitch progressions) the homogenization of the timbral palette (with fre-quent timbres becoming more frequent) and growing average loudnesslevels (threatening a dynamic richness that has been conserved until to-day) This suggests that our perception of the new would be essentiallyrooted on identifying simpler pitch sequences fashionable timbral mix-tures and louder volumes Hence an old tune with slightly simpler chordprogressions new instrument sonorities that were in agreement with cur-rent tendencies and recorded with modern techniques that allowed forincreased loudness levels could be easily perceived as novel fashionableand groundbreaking
This study serves as a good starting point for mathematically analyzing music in
a few ways First it utilizes the Million Song Dataset which addresses the issue
of legally obtaining music metadata As mentioned in section 13 the only legal
way to obtain playable music for this study would have been to purchase all songs I
would include which is infeasible While the Million Song Dataset does not contain
the audio files in playable format it does contain audio features and metadata that
allow for in-depth analysis In addition working with the dataset takes out the
work of extracting features from raw audio files saving an extensive amount of time
and energy Second the study establishes specifics for what constitutes a trend
in music Pitch timbre and loudness are core features of music and examining the6
distributions of each among songs over time reveals a lot of information about how
the music industry and consumersrsquo tastes have evolved While these are not all of the
features contained in a song they serve as a good starting point Third the study
defines mathematical ways to capture music attributes and measure their change
over time For example pitches are transposed into the same tonal context with
binary discretized pitch descriptions based on a threshold so that each song can be
represented with vectors of pitches that are normalized and compared to other songs
While this study lays some solid groundwork for capturing and analyzing nu-
meric qualities of music it falls short of addressing my goals in a couple of ways
First it does not perform any analysis with respect to music genre While the
analysis performed in this paper could easily be applied to a list of songs in a specific
genre certain genres might have unique sounds and rhythms relative to other genres
that would be worth studying in greater detail Second the study only measures
general trends in music over time The models used to describe changes are simple
regressions that donrsquot look at more nuanced changes For example what styles of
music developed over certain periods of time How rapid were those changes Which
styles of music developed from which other styles
A more promising study led by music researcher Matthias Mauch [8] analyzes
contemporary popular Western Music from the 1960s to 2010s by comparing numer-
ical data on the pitches and timbre of a corpus of 17000 songs that appeared on the
Billboard Hot 100 Like the previously mentioned paper Measuring the Evolution
of Contemporary Western Popular Music Mauchrsquos study also creates abstractions
of pitch and timbre in order to provide a consistent and meaningful semantic inter-
pretation of musical data (see figure 12) However Mauchrsquos study takes this idea a
step further by using genre tags from Lastfm a music website and constructing a
7
hierarchy of music genres using hierarchical clustering Additionally the study takes
a crack at determining whether a particular band the Beatles was musically ground-
breaking for its time or merely playing off sounds that other bands had already used
Figure 12 Data processing pipeline for Mauchrsquos study illustrated with a segment ofQueenrsquos Bohemian Rhapsody 1975
While both Measuring the Evolution of Contemporary Western Popular Music
and Mauchrsquos study created abstractions of pitch and timbre Mauchrsquos study is more
appealing with respect to my goal because its end results align more closely with
mine Additionally the data processing pipeline offers several layers of abstraction
8
and depending on my progress I would be able to achieve at least one of the levels of
abstraction As shown in figure 12 each segment of a raw audio file is first broken
down into its 12 timbre MFCCs and pitch components Next the study constructs
ldquolexiconsrdquo or a dictionary of pitch and timbre terms that all songs can be compared
to For pitch the original data is in a N-by-12 matrix where N is the number of time
segments in the song and 12 the number of each of the notes found in an octave of
pitches Each time segment contains the relative strengths of each of the 12 pitches
However music sounds are not merely a collection of pitches but more precisely
chords Furthermore the similarity of two songs is not determined by the absolute
pitches of their chords but rather the progression of chords in the song all relative to
each other For example if all the notes in a song are transposed by one step the song
will sound different in terms of absolute pitch but the song will still be recognized
as the original because all of the relative movements from each chord to the next
are the same This phenomenon is captured in the pitch data by finding the most
likely chord played at each time segment then counting the change to the next chord
at each time step and generating a table of chord change frequencies for each song
Constructing the timbre lexcion is more complicated since there is no easy analogue
like chords for pitches to compare songs Mauchrsquos study utilizes a Gaussian Mixture
Model (GMM) by iterating over k=1 to k=N clusters where N is a large number
running the GMM on each prior assumption of k clusters and computing the Bayes
Information Criterion (BIC) for each model The lowest of the N BIC values is found
and that value of k is selected That model contains k different timbre clusters
and each cluster contains the mean timbre value for each of the 12 timbre components
For my research I decided that the pitch and timbre lexicons would be the most
realistic level of abstraction I could obtain Mauchrsquos study adds an addtional layer
to pitch and timbre by identifying the most common patterns of chord changes and
9
most common timbre rhythms and creating more general tags from these combined
terms such as ldquo stepwise changes indicating modal harmonyrdquo for a pitch topic and
ldquooh rounded mellowrdquo for a timbral topic There were two problems with using this
final layer of abstraction for my study First attaching semantic interpretations to
the pitch and timbral lexicons is a difficult task For timbre I would need to listen
to sound samples containing all of the different timbral categories I identified and
attaching user interpretations to them For the chords not only would I have to
perform the same analysis as on timbre but take careful attention to identify which
chords correspond to common sound progressions in popular music a task that I am
not qualified for an did not have the resources for this thesis to seek out Second
this final layer of abstraction was not necessary for the end goal of my paper In
fact consolidating my pitch and timbre lexicons into simpler phrases would run the
risk of pigeonholing my analysis and preventing me from discovering more nuanced
patterns in my final results Therefore I decided to focus on pitch and timbral
lexicon construction as the furthest levels of abstraction when processing songs for
my thesis Mathematical details on how I constructed the lexical and timbral lexicons
can be found in the Mathematical Modeling section of this paper
13 The Dataset
In order to successfully execute my thesis I need access to an extensive database of
music Until recently acquiring a substantial corpus of music data was a difficult and
costly task It is illegal to download music audio files from video and music-sharing
sites such as YouTube Spotify and Pandora Some platforms such as iTunes offer
90-second previews of songs but using only segments of songs and usually segments
that showcase the chorus of the song are not reliable measures to capture the entire
essence of a song Even if I were to legally download entire audio files for free I would
10
run into additional issues Obtaining a high-quality corpora of song data would be
challenging writing scripts that crawl music sharing platforms may not capture all of
the music I am looking for And once I have the audio files I would have to perform
audio processing techniques to extract the relevant information from the songs
Fortunately there is an easy solution to the music data acquisition problem
The Million Song Dataset (MSD) is a collection of metadata for one million music
tracks dating up to 2011 Various organizations such as The Echo Nest Musicbrainz
7digital and Lastfm have contributed different pieces of metadata Each song is
represented as a Hierarchical Data Format file (HDF5) which can be loaded as a
JSON object The fields encompass topical features such as the song title artist
and release date as well as lower-level features such as the loudness starting beat
time pitches and timbre of several segments of the song [9] While the MSD is
the largest free and open source music metadata dataset I could find there is no
guarantee that it adequately covers the entire spectrum of EM artists and songs
This quality limitation is important to consider throughout the study A quick look
through the songs including the subset of data I worked with for this report showed
that there were several well-known artists and songs in the EM scene Therefore
while the MSD may not contain all desired songs for this project it contains an
adequate number of relevant songs to produce some meaningful results Additionally
laying the groundwork for modeling the similarities between songs and identifying
groundbreaking ones is the same regardless of the songs included and the following
methodologies can be implemented on any similarly-formatted dataset including one
with songs that might currently be missing in the MSD
11
Chapter 2
Mathematical Modeling
21 Determining Novelty of Songs
Finding an logical and implementable mathematical model was and continues to be
an important aspect of my research My problem how to mathematically determine
which songs were unique for their time requires an algorithm in which each song is
introduced in chronological order either joining an existing category or starting a
new category based on its musical similarity to songs already introduced Clustering
algorithms like k-means or Gaussian Mixture Models (GMM) which have a prede-
termined number of clusters and optimize the partitioning of a dataset into those
cluster assume a fixed number of clusters While this process would work if we knew
exactly how many genres of EM existed if we guess wrong our end results may end
up with clusters that are wrongly grouped together or separated It is much better to
apply a clustering algorithm that does not make any assumptions about this number
One particularly promising process that addresses the issue of tthe number of
clusters is a family of algorithms known as Dirichlet Proccesses (DPs) DPs are
useful for this particular application because (1) they assign clusters to a dataset
12
with only an upper bound on the number of clusters and (2) by sorting the songs
in chronological order before running the algorithm and keeping track of which
songs are categorized under each cluster we can observe the earliest songs in each
cluster and consequentially infer which songs were responsible for creating new
clusters The arguments for the DP The DP is controlled by a parameter α which
is the concentration parameter The expected number of clusters formed is directly
proportional to the value of α so the higher the value of α the more likely new
clusters will be formed [10] Regardless of the value of α as the number of data
points introduced increases the probability of a new group being formed decreases
That is a ldquorich get richerrdquo policy is in place and existing clusters tend to grow in
size Tweaking the value of the tunable parameter α is an important part of the
study since it determines the flexibility given to forming a new cluster If the value
of α is too small then the criteria for forming clusters will be too strict and data
that should be in different clusters will be assigned to the same cluster On the other
hand if α is too large the algorithm will be too sensitive and assign similar songs to
different clusters
The implementation of the DP was achieved using scikit-learnrsquos library and API for
Dirichlet Process Gaussian Mixture Model (DPGMM) The DPGMM is the formal
name of the Dirichlet Process model used to cluster the data More specifically
scikit-learnrsquos implementation of the DPGMM uses the Stick Breaking method
one of several equally valid methods to assign songs to clusters [11] While the
mathematical details for this algorithm can be found at the following citation [12]
the most important aspects of the DPGMM are the arguments that the user can
specify and tune The first of these tunable parameters is the value α which is the
same parameter as the α discussed in the previous paragraph As seen in Figure 21
on the right side properly tuning α is key to obtaining meaningful clusters The
13
center image has α set to 001 which is too small and results in all of the data being
formed under one cluster On the other hand the bottom-right image has the same
data set and α set to 100 which does a better job of clustering On a related note
the figure also demonstrates the effectiveness of the DPGMM over the GMM On the
left side clearly the dataset contains 2 clusters but the GMM on the top-left image
assumes 5 clusters as a prior and consequentially clusters the data incorrectly while
the DPGMM manages to limit the data to 2 clusters
The second argument that the user inputs for the DPGMM is the data that
will be clustered The scikit-learn implementation takes the data in the format
of a nested list (N lists each of length m) where N is the number of data points
and m the number of features While the format of the data structure is relatively
straightforward choosing which numbers should be in the data was a challenge I
faced Selecting the relevant features of each song to be used in the algorithm will
be expounded upon in the next section ldquoFeature Selectionrdquo
The last argument that a user inputs for the scikit-learn DGPMM implementa-
tion is an argument indicating the upper bound for the number of clusters The
Dirichlet Process then determines the best number of clusters for the data between
1 and the upper bound Since the DPGMM is flexible enough to find the best value
I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to
modify the number of clusters formed
22 Feature Selection
One of the most difficult aspects of the Dirichlet method is choosing the features
to be used for clustering In other words when we organize the songs into clusters
14
Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α
we need to ensure that each cluster is distinct in a way that is statistically and
intuitively logical In the Million Song Dataset [9] each song is represented as a
JSON object containing several fields These fields are candidate features to be used
in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo
by Rick Astley and the corresponding features
artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID
for this artists is db9)
artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)
artist_mbtags_count shape = (4)
(raw tag count of the 4 tags this artist received on musicbrainzorg)
artist_name Rick Astley (artist name)
artist_playmeid 1338 (the ID of that artist on the service playmecom)
artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)
artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest
(number between 0 and 1))
artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest
(number between 0 and 1))
15
audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for
the analysis by The Echo Nest)
bars_confidence shape = (99) (confidence value (between 0 and 1) associated
with each bar by The Echo Nest)
bars_start shape = (99) (start time of each bar according to The Echo Nest this
song has 99 bars)
beats_confidence shape = (397))
confidence value (between 0 and 1) associated with each beat by The Echo Nest
beats_start shape = (397) (start time of each beat according to The Echo Nest
this song has 397 beats)
danceability 00 (danceability measure of this song according to The Echo Nest
(between 0 and 1 0 =gt not analyzed))
duration 21169587 (duration of the track in seconds)
end_of_fade_in 0139 (time of the end of the fade in at the beginning of the
song according to The Echo Nest)
energy 00 (energy measure (not in the signal processing sense) according to The
Echo Nest (between 0 and 1 0 = not analyzed))
key 1 (estimation of the key the song is in by The Echo Nest)
key_confidence 0324 (confidence of the key estimation)
loudness -775 (general loudness of the track)
mode 1 (estimation of the mode the song is in by The Echo Nest)
mode_confidence 0434 (confidence of the mode estimation)
release Big Tunes - Back 2 The 80s (album name from which the track was taken
some songs tracks can come from many albums we give only one)
release_7digitalid 786795 (the ID of the release (album) on the service 7digi-
talcom)
sections_confidence shape = (10) (confidence value (between 0 and 1) associated
16
with each section by The Echo Nest)
sections_start shape = (10) (start time of each section according to The Echo
Nest this song has 10 sections)
segments_confidence shape = (935) (confidence value (between 0 and 1) asso-
ciated with each segment by The Echo Nest)
segments_loudness_max shape = (935) (max loudness during each segment)
segments_loudness_max_time shape = (935) (time of the max loudness
during each segment)
segments_loudness_start shape = (935) (loudness at the beginning of each
segment)
segments_pitches shape = (935 12) (chroma features for each segment (normal-
ized so max is 1))
segments_start shape = (935) (start time of each segment ( musical event or
onset) according to The Echo Nest this song has 935 segments)
segments_timbre shape = (935 12) (MFCC-like features for each segment)
similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar
to Rick Astley according to The Echo Nest)
song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded
(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))
song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can
be associated with many tracks (with very slight audio differences))
start_of _fade _out 198536 (start time of the fade out in seconds at the end
of the song according to The Echo Nest)
tatums_confidence shape = (794) (confidence value (between 0 and 1) associated
with each tatum by The Echo Nest)
tatums_start shape = (794) (start time of each tatum according to The Echo
Nest this song has 794 tatums)
17
tempo 113359 (tempo in BPM according to The Echo Nest)
time_signature 4 (time signature of the song according to The Echo Nest ie
usual number of beats per bar)
time_signature_confidence 0634 (confidence of the time signature estimation)
title Never Gonna Give You Up (song title)
track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)
track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track
on which the analysis was done) year 1987 (year when this song was released
according to musicbrainzorg)
When choosing features my main goal was to use features that would most
likely yield meaningful results yet also be simple and make sense to the average
person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener
will have his or her opinions to what constitutes different types of music but some
common features most people tend to differentiate songs by are pitch rhythm and
the types of instruments used The following specific fields provided in each song
object fall under these three terms
Pitch
bull segments_pitches a matrix of values indicating the strength of each pitch (or
note) at each discernible time interval
Rhythm
bull beats_start a vector of values indicating the start time of each beat
bull time_signature the time signature of the song
bull tempo the speed of the song in Beats Per Minute (BPM)
Instruments
18
bull segments_timbre a matrix of values indicating the distribution of MFCC-like
features (different types of tones) for each segments
The segments_pitches feature is a clear candidate for a differentiating factor for
songs since it reveals patterns of notes that occur Additionally other research
papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a
procedure that allows all songs to be compared with the same metric Likewise
timbre is intuitively a reliable differentiating feature since it reveals the amount
that different tones or sounds that sound different despite having the same pitch
Therefore segments_timbre is another feature that is considered in each song
Finally we look at the candidate features for rhythm At first glance all of these
features appear to be useful as they indicate the rhythm of a song in one way or
another However none of these features are as useful as the pitch and timbre
features While tempo is one factor in differentiating genres of EDM and music in
general tempo alone is not a driving force of musical innovation Certain genres
of EDM like drum nrsquo bass and happycore stand out for having very fast tempos
but the tempo is supplemented with a sound unique to the genre Conceiving new
arrangements of pitches combining instruments in new ways and inventing new
types of sounds are novel but speeding up or slowing down existing sounds is not
Including tempo as a feature could actually add noise to the model since many genres
overlap in their tempos And finally tempo is measured indirectly when the pitch
and timbre features are normalized for each song everything is measured in units of
ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features
each second Time signature can be dismissed from the candidates features for the
same reason as tempo many genres contain the same time signature and including
it in the feature set would only add more noise beats_start looks like a more
promising feature since like segments_pitches and segments_timbre it consists of
a vector of values However difficulties arise when we begin to think how exactly
19
we can utilize this information Since each song varies in length we need a way to
compare songs of different durations on the same level One approach could be to
perform basic statistics on the distance between each beat for example calculating
the mean and standard deviation of this distance However the normalized pitch
and timbre information already capture this data Another possibility is detecting
certain patterns of beats which could differentiate the syncopated dubstep or glitch
music beats from the steady pulse of electro-house But once again every beat is
accompanied by a sound with a specific timbre and pitch so this feature would not
add any significantly new information
23 Collecting Data and Preprocessing Selected Fea-
tures
231 Collecting the Data
Upon deciding the features I wanted to use in my research I first needed to collect
all of the electronic songs in the Million Song Dataset The easiest reliable way to
achieve this was to iterate through each song in the database and save the information
for the songs where any of the artist genre tags in artist_mbtags matched with an
electronic music genre While this measure was not fully accurate because it looks at
the genre of the artist not the song specific genre information for each song was not
as easily accessible so this indicator was nearly as good a substitute To generate a
list of the genres that electronic songs would fall under I manually searched through
a subset of the MSD to find all genres that seemed to be releated to electronic music
In the case of genres that were sometimes but not always electronic in nature such
20
as disco or pop I erred on the side of caution and did not include them in the list
of electronic genres In these cases false positives such as primarily rock songs that
happen to have the disco label attached to the artist could inadvertantly be included
in the dataset The final list of genres is as follows
target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo
rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo
rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo
rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
rsquodance and electronicarsquorsquoelectronicrsquo]
232 Pitch Preprocessing
A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-
cally informed manner The study first takes the raw sound data and converts it into
a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest
amount Then it computes the most likely chord by comparing comparing the 4 most
common types of chords in popular music (major minor dominant 7 and minor 7) to
the observed chord The most common chords are represented as ldquotemplate chordsrdquo
and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For
example using the note C as the first index the C major chord is represented as
CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)
For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is
computed over every template chord
ρCTc =12sumi=1
(CT minus CTi)(ci minus c)σCTσc
21
where CT is the mean of the values in the template chord σCT is the standard
deviation of the values in the chord and the operations on c are analogous Note
that the summation is over each individual pitch in the 12 pitch classes The chord
template with the highest value of ρ is selected as the chord for the time frame
After this is performed for each time frame the values are smoothed and then the
change between adjacent chords is observed The reasoning behind this step is that
by measuring the relative distance between chords rather than the chords themselves
all songs can be compared in the same manner even though they may have different
key signatures Finally the study takes the types of chord changes and classifies
them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted
versions of the chord changes that make more sense to a human such as ldquochanges
involving dominant 7th chordsrdquo
In my preliminary implementation of this method on an electronic dance music
corpus I made a few modifications to Mauchrsquos study First I smoothed out time
frames before computing the most probable chords rather than smoothing the most
probable chords I did this to save time and to reduce volatility in the chord
measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference
which contains 935 time frames and lasts 212 seconds 5 time frames is slightly
under 1 second and for preliminary testing appeared to be a good interval for each
time block Second as mentioned in the literature section I did not abstract the
chord changes into H-topics This decision also stemmed from time constraints since
deriving semantic chord meaning from EDM songs would require careful research
into the types of harmonies and sounds common in that genre of music Below I
included a high-level visualization of the pitch metadata found in a sample song
ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change
vector that I could then feed into the Dirichlet Process algorithm
22
Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813
CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513
DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913
FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713
GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213
AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513
13 13 13 13
060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013
13 13 13 13 13 13 13 13 13 13 13 13
13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13
Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13
Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13
Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13
Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13
23
13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13
13
13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13
Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13
24
A final step I took to normalize the chord change data was to divide the numbers by
the length of the song so that each songrsquos number of chord changes was measured per
second
233 Timbre Preprocessing
For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre
uniformly across all songs [8] After collecting all song metadata I took a random
sample of 20 songs from each year starting at 1970 The reason I forced the sampling
to 20 randomly sampled songs from each year and did not take a random sample of
songs from all years at once was to prevent bias towards any type of sounds As seen
in figure 22 there are significantly more songs from 2000-2011 than before 2000 The
mean year is x = 2001052 the median year is 2003 and the standard deviation of the
years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include
a disproportionate amount of more recent songs In order to not miss out on sounds
that may be more prevalent in older songs I required a set number of songs from each
year Next from each randomly selected song I selected 20 random timbre frames
in order to prevent any biases in data collection within each song In total there
were 422020 = 16800 timbre frames collected Next I clustered the timbre frames
using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to
100 and selecting the number of clusters with the lowest Bayes Information Criterion
(BIC) a statistical measure commonly used to calculate the best fitting model The
BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters
and saved the mean values of each of the 12 timbre segments for each cluster formed
In the same way that every song had the same 192 cord changes whose frequencies
could be compared between songs each song now had the same 46 timbre clusters
but different frequencies in each song When reading in the metadata from each song
I calculated the most likely timbre cluster each timbre frame belonged to and kept
25
Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear
a frequency count of all of the possible timbre clusters observed in a song Finally
as with the pitch data I divided all observed counts by the duration of the song in
order to normalize each songrsquos timbre counts
26
Chapter 3
Results
31 Methodology
After the pitch and timbre data was processed I ran the Dirichlet Process on the
data For each song I concatenated the 192-element chord change frequency list
and the 46-element timbre category frequency list giving each song a total of 238
features However there is a problem with this setup The pitch data will inherently
dominate the clustering process since it contains almost 3 times as many features
as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to
give different weights to each feature I considered another possibility to remedy
this discrepancy duplicating the timbre vector a certain number of times and
concatenating that to the feature set of each song While this strategy runs the risk
of corrupting the feature set and turning it into something that does not accurately
represent each song it is important to keep in mind that even without duplicating
the timbre vector the feature set consists of two separate feature sets concatenated
to each other Therefore timbre duplication appears to be a reasonable strategy to
weigh pitch and timbre more evenly
27
After this modification I tweaked a few more parameters before obtaining my
final results Dividing the pitch and timbre frequencies by the duration of the song
normalized every song to frequency per second but it also had the undesired effect
of making the data too small Timbre and pitch frequencies per second were almost
always less than 10 and many times hovered as low as 0002 for nonzero values
Because all of the values were very close to each other using common values of α
in the range of 01 to 1000-2000 was insufficient to push the songs into different
clusters As a result every song fell into the same cluster Increasing the value
of α by several orders of magnitude to well over 10 million fixed the problem but
this solution presented two problems First tuning α to experiment with different
ways to cluster the music would be problematic since I would have to work with
an enormous range of possible values for α Second pushing α to such high values
is not appropriate for the Dirichlet Process Extremely high values of α indicate a
Dirichlet Process that will try to disperse the data into different clusters but a value
of α that high is in principle always assigning each new song to a new cluster On
the other hand varying α between 01 and 1000 for example presents a much wider
range of flexibility when assigning clusters While this may be possible by varying
the values of α an extreme amount with the data as it currently is we are using
the Dirichlet Process in a way it should mathematically not be used Therefore
multiplying all of the data by a constant value so that we can work in the appropriate
range of α is the ideal approach After some experimentation I found that k=10 was
an appropriate scaling factor After initial runs of the Dirichlet Process I found out
that there was a slight issue with some of the earlier songs Since I had only artist
genre tags not specific song tags for each song I chose songs based on whether any
of the tags associated with the artist fell under any electronic music genre including
the generic term rsquoelectronicrsquo There were some bands mostly older ones from the
1960s and 1970s like Electric Light Orchestra which had some electronic music but
28
mostly featured rock funk disco or another genre Given that these artists featured
mostly non-electronic songs I decided to exclude them from my study and generate
a blacklist indicating these music artists While it was infeasible to look through
every single song and determine whether it was electronic or not I was able to look
over the earliest songs in each cluster These songs were the most important to verify
as electronic because early non-electronic songs could end up forming new clusters
and inadvertently create clusters with non-electronic sounds that I was not looking for
The goal of this thesis is to identify different groups in which EM songs are
clustered and identify the most unique artists and genres While the second task is
very simple because it requires looking at the earliest songs in each cluster the first
is difficult to gauge the effectiveness of While I can look at the average chord change
and timbre category frequencies in each category as well as other metadata putting
more semantic interpretations to what the music actually sounds like and determining
whether the music is clustered properly is a very subjective process For this reason
I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and
compared the clustering in each category examining similarities and differences in
the clusters formed in each scenario in the Discussion section For each value α I
set the upper limit of components or clusters allowed to 50 The ranges of α I used
resulted in 9 14 and 19 clusters formed
32 Findings
321 α=005
When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are
the distribution of years of the songs in each cluster (note that the Dirichlet Process
29
does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are
skipped)
30
Figure 31 Song year distributions for α = 005
For each value of α I also calculated the average frequency of each chord change
category and timbre category for each cluster and plotted the results The green
lines correspond to timbre and the blue lines to pitch
31
Figure 32 Timbre and pitch distributions for α = 005
A table of each cluster formed the number of songs in that cluster and descriptions
of pitch timbre and rhythmic qualities characteristic of songs in that cluster are
shown below
32
Cluster Song Count Characteristic Sounds
0 6481 Minimalist industrial space sounds dissonant chords
1 5482 Soft New Age ethereal
2 2405 Defined sounds electronic and non-electronic instru-
ments played in standard rock rhythms
3 360 Very dense and complex synths slightly darker tone
4 4550 Heavily distorted rock and synthesizer
6 2854 Faster paced 80s synth rock acid house
8 798 Aggressive beats dense house music
9 1464 Ambient house trancelike strong beats mysterious
tone
11 1597 Melancholy tones New wave rock in 80s then starting
in 90s downtempo trip-hop nu-metal
Table 31 Song cluster descriptions for α = 005
322 α=01
A total of 14 clusters were formed (16 were formed but 2 clusters contained only
one song each I listened to both of these songs and they did not sound unique so
I discarded them from the clusters) Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
33
34
Figure 33 Song year distributions for α = 01
35
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
List of Tables
31 Song cluster descriptions for α = 005 33
32 Song cluster descriptions for α = 01 38
33 Song cluster descriptions for α = 02 45
viii
List of Figures
11 A userrsquos taste profile generated by Spotify 4
12 Data processing pipeline for Mauchrsquos study illustrated with a segment
of Queenrsquos Bohemian Rhapsody 1975 8
21 scikit-learn example of GMM vs DPGMM and tuning of α 15
22 Number of Electronic Music Songs in Million Song Dataset from Each
Year 26
31 Song year distributions for α = 005 31
32 Timbre and pitch distributions for α = 005 32
33 Song year distributions for α = 01 35
34 Timbre and pitch distributions for α = 01 37
35 Song year distributions for α = 02 41
36 Timbre and pitch distributions for α = 02 44
ix
Chapter 1
Introduction
11 Background Information
Electronic Music (EM) is an increasingly popular genre of music with an immense
presence and influence on modern culture Because the genre is new as a whole and
is arguably more loosely structured than other genres - technology has enabled the
creation of a wide range of sounds and easy blending of existing and new sounds alike
- formal analysis especially mathematical analysis on the genre is fairly limited and
has only begun growing in the past few years As a fan of EM I am interested in
exploring how the genre has evolved over time More specifically my goal with this
project was to design some structure or model that could help me identify which EM
artists have contributed the most stylistically to the genre Oftentimes famous EM
artists do not create novel-sounding music but rather popularize an existing style
and the motivation of this study is to understand who has stylistically contributed
the most to the EM scene versus those who have merely popularized aspects of it
As the study progressed the manner in which I constructed my model lent to
a second goal of the thesis imagining new ways in which we can imagine EM genres
1
While there exists an extensive amount of research analyzing music trends from
a non-mathematical (cultural societal artistic) perspective the analysis of EM
from a mathematical perspective and especially with respect to any computationally
measurable trends in the genre is close to nonexistent EM has been analyzed to a
lesser extent than other common genres of music in the academic world most likely
due to existing for a shorter amount of time and being less rooted in prominant
social and cultural events In fact the first published reference work on EM did not
exist until 2012 when Professor Mark J Butler from Northwestern University edited
and published Electronica Dance and Club Music a collection of essays exploring
EM genres and culture [1] Furthermore there are very few comprehensive visual
guides that allow a user to relate every genre to each other and easily observe how
different genres converge and diverge While conducting research the best guide I
found was not a scholarly source but an online guide created by an EM enthusiast
Ishkurrsquos Guide to Electronic Music [2] This guide which includes over 100 specific
genres grouped by more general genres and represents chronological evolutions by
connecting each genre in a flowchart is the most exhaustive analysis of the EM scene
I could find However the guidersquos analysis is very qualitative While each subgenre
contains an explanation on typical rhythm and sounds and includes well-known
songs indicative of the style the guide is created by someone who used historical and
personal knowledge of EM My model which creates music genres by chronologically
ordering songs and then assigning them to clusters is a different approach towards
imagining the entire landscape of EM The results may confirm Ishkurrsquos Guidersquos
findings in which case his guide is given additional merit with mathematical evi-
dence or it may be different suggesting that there may be better ways to group EM
genres One advantage that guides such as Ishkurrsquos and historically-based scholarly
works have over my approach is that those models are history-sensitive and therefore
may group songs in a way that historically makes sense On the other hand my
2
model is history-agnostic and may not realize the historical context of songs when
clustering However I believe that there is still significant merit to my research
Instead of classifying genres of music by early genres that led to them my approach
gives the most credit to the artists and songs that were the most innovative for their
time and perhaps reveal different musical styles that are more similar to each other
than history would otherwise imply This way of thinking of music genres while
unconventional is another way of imagining EM
The practice of quantitatively analyzing music has exploded in the last decade
thanks to technological and algorithmic advances that allow data scientists to con-
structively sift through troves of music and listener information In the literature
review I will focus on two particular organizations that have contributed greatly to
the large-scale mathematical analysis of music Pandora a website that plays songs
similar to a songartistalbum inputted by a user and Echo Nest a music analytics
firm that was acquired by Spotify in 2014 and drives Spotifyrsquos Discover Weekly
feature [3] After evaluating the relevance of these sources to my thesis work I will
then look over the relevant academic research and evaluate what this research can
contribute
12 Literature Review
The analysis of quantitative music generally falls into two categories research con-
ducted by academics and academic organizations for scholarly purposes and research
conducted by companies and primarily targeted for consumers First looking at the
consumer-based research Spotify and Pandora are two of the most prominent based
groups and the two I decided to focus on Spotify is a music streaming service where
users can listen to albums and songs from a wide variety of artists or listen to weekly
3
playlists generated based on the music the user and userrsquos friends have listened to
The weekly playlist called Discover Weekly Playlist is a relatively new feature in
Spotify and is driven by music analysis algorithms created from Echo Nest Using
the Echo Nest code interface Spotify creates a ldquotaste profilerdquo for each user which
assesses attributes such as how often a user branches out to new styles of music how
closely the userrsquos music streamed follows popular Billboard music charts and so on
Spotify also looks at the artists and songs the user streamed and creates clusters
of different genres that the user likes (see figure 11) The taste profile and music
clusters can then be used to generate playlists geared to a specific user The genres
in the cluster come from a list of nearly 800 names which are derived by scraping
the Internet for trending terms in music as well as training various algorithms on a
regular basic by ldquolisteningrdquo to new songs [4][5]
Figure 11 A userrsquos taste profile generated by Spotify
4
Although Spotify and Echo Nestrsquos algorithms are very useful for mapping the land-
scape of established and emerging genres of music the methodology is limited to
pre-defined genres of music This may serve as a good starting point to compare my
final results to but my study aims to be as context-free as possible by attaching no
preconceived notions of music styles or genres instead looking at features that could
be measured in every song
While Spotifyrsquos approach to mapping music is very high-tech and based on ex-
isting genres Pandora takes a very low-tech and context-free approach to music
clustering Pandora created the Music Genome Project a multi-year undertaking
where skilled music theorists listened to a large number of songs and analyzed up to
450 characteristics in each song [6] Pandorarsquos approach is appealing to the aim of
my study since it does not take any preconceived notions of what a genre of music
is instead comparing songs on common characteristics such as pitch rhythm and
instrument patterns Unfortunately I do not have a cadre of skilled music theorists
at my disposal nor do I have 10 years to perform such calculations like the dedicated
workers at Pandora (tips the indestructible fedora) Additionally Pandorarsquos Music
Genome Project is intellectual property so at best I can only rely on the abstract
concepts of the Music Genome Project to drive my study
In the academic realm there are no existing studies analyzing quantifiable changes in
EM specifically but there exist a few studies that perform such analysis on popular
Western music in general One such study is Measuring the Evolution of Contem-
porary Western Popular Music which analyzes music from 1955-2010 spanning all
common genres Using the Million Song Dataset a free public database of songs
each containing metadata (see section 13) the study focuses on the attributes pitch
timbre and loudness Pitch is defined as the standard musical notes or frequency of
5
the sound waves Timbre is formally defined as the Mel frequency cepstral coefficients
(MFCC) of a transformed sound signal More informally it refers to the sound color
texture or tone quality and is associated with instrument types recording resources
and production techniques In other words two sounds that have the same pitch
but different tones (for example a bell and voice) are differentiated by their timbres
There are 12 MFCCs that define the timbre of a given sound Finally loudness
refers to intrinsically how loud the music sounds not loudness that a listener can
manipulate while listening to the music Loudness is the first MFCC of the timbre
of a sound [7] The study concluded that over time music has been becoming louder
and less diverse
The restriction of pitch sequences (with metrics showing less variety inpitch progressions) the homogenization of the timbral palette (with fre-quent timbres becoming more frequent) and growing average loudnesslevels (threatening a dynamic richness that has been conserved until to-day) This suggests that our perception of the new would be essentiallyrooted on identifying simpler pitch sequences fashionable timbral mix-tures and louder volumes Hence an old tune with slightly simpler chordprogressions new instrument sonorities that were in agreement with cur-rent tendencies and recorded with modern techniques that allowed forincreased loudness levels could be easily perceived as novel fashionableand groundbreaking
This study serves as a good starting point for mathematically analyzing music in
a few ways First it utilizes the Million Song Dataset which addresses the issue
of legally obtaining music metadata As mentioned in section 13 the only legal
way to obtain playable music for this study would have been to purchase all songs I
would include which is infeasible While the Million Song Dataset does not contain
the audio files in playable format it does contain audio features and metadata that
allow for in-depth analysis In addition working with the dataset takes out the
work of extracting features from raw audio files saving an extensive amount of time
and energy Second the study establishes specifics for what constitutes a trend
in music Pitch timbre and loudness are core features of music and examining the6
distributions of each among songs over time reveals a lot of information about how
the music industry and consumersrsquo tastes have evolved While these are not all of the
features contained in a song they serve as a good starting point Third the study
defines mathematical ways to capture music attributes and measure their change
over time For example pitches are transposed into the same tonal context with
binary discretized pitch descriptions based on a threshold so that each song can be
represented with vectors of pitches that are normalized and compared to other songs
While this study lays some solid groundwork for capturing and analyzing nu-
meric qualities of music it falls short of addressing my goals in a couple of ways
First it does not perform any analysis with respect to music genre While the
analysis performed in this paper could easily be applied to a list of songs in a specific
genre certain genres might have unique sounds and rhythms relative to other genres
that would be worth studying in greater detail Second the study only measures
general trends in music over time The models used to describe changes are simple
regressions that donrsquot look at more nuanced changes For example what styles of
music developed over certain periods of time How rapid were those changes Which
styles of music developed from which other styles
A more promising study led by music researcher Matthias Mauch [8] analyzes
contemporary popular Western Music from the 1960s to 2010s by comparing numer-
ical data on the pitches and timbre of a corpus of 17000 songs that appeared on the
Billboard Hot 100 Like the previously mentioned paper Measuring the Evolution
of Contemporary Western Popular Music Mauchrsquos study also creates abstractions
of pitch and timbre in order to provide a consistent and meaningful semantic inter-
pretation of musical data (see figure 12) However Mauchrsquos study takes this idea a
step further by using genre tags from Lastfm a music website and constructing a
7
hierarchy of music genres using hierarchical clustering Additionally the study takes
a crack at determining whether a particular band the Beatles was musically ground-
breaking for its time or merely playing off sounds that other bands had already used
Figure 12 Data processing pipeline for Mauchrsquos study illustrated with a segment ofQueenrsquos Bohemian Rhapsody 1975
While both Measuring the Evolution of Contemporary Western Popular Music
and Mauchrsquos study created abstractions of pitch and timbre Mauchrsquos study is more
appealing with respect to my goal because its end results align more closely with
mine Additionally the data processing pipeline offers several layers of abstraction
8
and depending on my progress I would be able to achieve at least one of the levels of
abstraction As shown in figure 12 each segment of a raw audio file is first broken
down into its 12 timbre MFCCs and pitch components Next the study constructs
ldquolexiconsrdquo or a dictionary of pitch and timbre terms that all songs can be compared
to For pitch the original data is in a N-by-12 matrix where N is the number of time
segments in the song and 12 the number of each of the notes found in an octave of
pitches Each time segment contains the relative strengths of each of the 12 pitches
However music sounds are not merely a collection of pitches but more precisely
chords Furthermore the similarity of two songs is not determined by the absolute
pitches of their chords but rather the progression of chords in the song all relative to
each other For example if all the notes in a song are transposed by one step the song
will sound different in terms of absolute pitch but the song will still be recognized
as the original because all of the relative movements from each chord to the next
are the same This phenomenon is captured in the pitch data by finding the most
likely chord played at each time segment then counting the change to the next chord
at each time step and generating a table of chord change frequencies for each song
Constructing the timbre lexcion is more complicated since there is no easy analogue
like chords for pitches to compare songs Mauchrsquos study utilizes a Gaussian Mixture
Model (GMM) by iterating over k=1 to k=N clusters where N is a large number
running the GMM on each prior assumption of k clusters and computing the Bayes
Information Criterion (BIC) for each model The lowest of the N BIC values is found
and that value of k is selected That model contains k different timbre clusters
and each cluster contains the mean timbre value for each of the 12 timbre components
For my research I decided that the pitch and timbre lexicons would be the most
realistic level of abstraction I could obtain Mauchrsquos study adds an addtional layer
to pitch and timbre by identifying the most common patterns of chord changes and
9
most common timbre rhythms and creating more general tags from these combined
terms such as ldquo stepwise changes indicating modal harmonyrdquo for a pitch topic and
ldquooh rounded mellowrdquo for a timbral topic There were two problems with using this
final layer of abstraction for my study First attaching semantic interpretations to
the pitch and timbral lexicons is a difficult task For timbre I would need to listen
to sound samples containing all of the different timbral categories I identified and
attaching user interpretations to them For the chords not only would I have to
perform the same analysis as on timbre but take careful attention to identify which
chords correspond to common sound progressions in popular music a task that I am
not qualified for an did not have the resources for this thesis to seek out Second
this final layer of abstraction was not necessary for the end goal of my paper In
fact consolidating my pitch and timbre lexicons into simpler phrases would run the
risk of pigeonholing my analysis and preventing me from discovering more nuanced
patterns in my final results Therefore I decided to focus on pitch and timbral
lexicon construction as the furthest levels of abstraction when processing songs for
my thesis Mathematical details on how I constructed the lexical and timbral lexicons
can be found in the Mathematical Modeling section of this paper
13 The Dataset
In order to successfully execute my thesis I need access to an extensive database of
music Until recently acquiring a substantial corpus of music data was a difficult and
costly task It is illegal to download music audio files from video and music-sharing
sites such as YouTube Spotify and Pandora Some platforms such as iTunes offer
90-second previews of songs but using only segments of songs and usually segments
that showcase the chorus of the song are not reliable measures to capture the entire
essence of a song Even if I were to legally download entire audio files for free I would
10
run into additional issues Obtaining a high-quality corpora of song data would be
challenging writing scripts that crawl music sharing platforms may not capture all of
the music I am looking for And once I have the audio files I would have to perform
audio processing techniques to extract the relevant information from the songs
Fortunately there is an easy solution to the music data acquisition problem
The Million Song Dataset (MSD) is a collection of metadata for one million music
tracks dating up to 2011 Various organizations such as The Echo Nest Musicbrainz
7digital and Lastfm have contributed different pieces of metadata Each song is
represented as a Hierarchical Data Format file (HDF5) which can be loaded as a
JSON object The fields encompass topical features such as the song title artist
and release date as well as lower-level features such as the loudness starting beat
time pitches and timbre of several segments of the song [9] While the MSD is
the largest free and open source music metadata dataset I could find there is no
guarantee that it adequately covers the entire spectrum of EM artists and songs
This quality limitation is important to consider throughout the study A quick look
through the songs including the subset of data I worked with for this report showed
that there were several well-known artists and songs in the EM scene Therefore
while the MSD may not contain all desired songs for this project it contains an
adequate number of relevant songs to produce some meaningful results Additionally
laying the groundwork for modeling the similarities between songs and identifying
groundbreaking ones is the same regardless of the songs included and the following
methodologies can be implemented on any similarly-formatted dataset including one
with songs that might currently be missing in the MSD
11
Chapter 2
Mathematical Modeling
21 Determining Novelty of Songs
Finding an logical and implementable mathematical model was and continues to be
an important aspect of my research My problem how to mathematically determine
which songs were unique for their time requires an algorithm in which each song is
introduced in chronological order either joining an existing category or starting a
new category based on its musical similarity to songs already introduced Clustering
algorithms like k-means or Gaussian Mixture Models (GMM) which have a prede-
termined number of clusters and optimize the partitioning of a dataset into those
cluster assume a fixed number of clusters While this process would work if we knew
exactly how many genres of EM existed if we guess wrong our end results may end
up with clusters that are wrongly grouped together or separated It is much better to
apply a clustering algorithm that does not make any assumptions about this number
One particularly promising process that addresses the issue of tthe number of
clusters is a family of algorithms known as Dirichlet Proccesses (DPs) DPs are
useful for this particular application because (1) they assign clusters to a dataset
12
with only an upper bound on the number of clusters and (2) by sorting the songs
in chronological order before running the algorithm and keeping track of which
songs are categorized under each cluster we can observe the earliest songs in each
cluster and consequentially infer which songs were responsible for creating new
clusters The arguments for the DP The DP is controlled by a parameter α which
is the concentration parameter The expected number of clusters formed is directly
proportional to the value of α so the higher the value of α the more likely new
clusters will be formed [10] Regardless of the value of α as the number of data
points introduced increases the probability of a new group being formed decreases
That is a ldquorich get richerrdquo policy is in place and existing clusters tend to grow in
size Tweaking the value of the tunable parameter α is an important part of the
study since it determines the flexibility given to forming a new cluster If the value
of α is too small then the criteria for forming clusters will be too strict and data
that should be in different clusters will be assigned to the same cluster On the other
hand if α is too large the algorithm will be too sensitive and assign similar songs to
different clusters
The implementation of the DP was achieved using scikit-learnrsquos library and API for
Dirichlet Process Gaussian Mixture Model (DPGMM) The DPGMM is the formal
name of the Dirichlet Process model used to cluster the data More specifically
scikit-learnrsquos implementation of the DPGMM uses the Stick Breaking method
one of several equally valid methods to assign songs to clusters [11] While the
mathematical details for this algorithm can be found at the following citation [12]
the most important aspects of the DPGMM are the arguments that the user can
specify and tune The first of these tunable parameters is the value α which is the
same parameter as the α discussed in the previous paragraph As seen in Figure 21
on the right side properly tuning α is key to obtaining meaningful clusters The
13
center image has α set to 001 which is too small and results in all of the data being
formed under one cluster On the other hand the bottom-right image has the same
data set and α set to 100 which does a better job of clustering On a related note
the figure also demonstrates the effectiveness of the DPGMM over the GMM On the
left side clearly the dataset contains 2 clusters but the GMM on the top-left image
assumes 5 clusters as a prior and consequentially clusters the data incorrectly while
the DPGMM manages to limit the data to 2 clusters
The second argument that the user inputs for the DPGMM is the data that
will be clustered The scikit-learn implementation takes the data in the format
of a nested list (N lists each of length m) where N is the number of data points
and m the number of features While the format of the data structure is relatively
straightforward choosing which numbers should be in the data was a challenge I
faced Selecting the relevant features of each song to be used in the algorithm will
be expounded upon in the next section ldquoFeature Selectionrdquo
The last argument that a user inputs for the scikit-learn DGPMM implementa-
tion is an argument indicating the upper bound for the number of clusters The
Dirichlet Process then determines the best number of clusters for the data between
1 and the upper bound Since the DPGMM is flexible enough to find the best value
I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to
modify the number of clusters formed
22 Feature Selection
One of the most difficult aspects of the Dirichlet method is choosing the features
to be used for clustering In other words when we organize the songs into clusters
14
Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α
we need to ensure that each cluster is distinct in a way that is statistically and
intuitively logical In the Million Song Dataset [9] each song is represented as a
JSON object containing several fields These fields are candidate features to be used
in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo
by Rick Astley and the corresponding features
artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID
for this artists is db9)
artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)
artist_mbtags_count shape = (4)
(raw tag count of the 4 tags this artist received on musicbrainzorg)
artist_name Rick Astley (artist name)
artist_playmeid 1338 (the ID of that artist on the service playmecom)
artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)
artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest
(number between 0 and 1))
artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest
(number between 0 and 1))
15
audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for
the analysis by The Echo Nest)
bars_confidence shape = (99) (confidence value (between 0 and 1) associated
with each bar by The Echo Nest)
bars_start shape = (99) (start time of each bar according to The Echo Nest this
song has 99 bars)
beats_confidence shape = (397))
confidence value (between 0 and 1) associated with each beat by The Echo Nest
beats_start shape = (397) (start time of each beat according to The Echo Nest
this song has 397 beats)
danceability 00 (danceability measure of this song according to The Echo Nest
(between 0 and 1 0 =gt not analyzed))
duration 21169587 (duration of the track in seconds)
end_of_fade_in 0139 (time of the end of the fade in at the beginning of the
song according to The Echo Nest)
energy 00 (energy measure (not in the signal processing sense) according to The
Echo Nest (between 0 and 1 0 = not analyzed))
key 1 (estimation of the key the song is in by The Echo Nest)
key_confidence 0324 (confidence of the key estimation)
loudness -775 (general loudness of the track)
mode 1 (estimation of the mode the song is in by The Echo Nest)
mode_confidence 0434 (confidence of the mode estimation)
release Big Tunes - Back 2 The 80s (album name from which the track was taken
some songs tracks can come from many albums we give only one)
release_7digitalid 786795 (the ID of the release (album) on the service 7digi-
talcom)
sections_confidence shape = (10) (confidence value (between 0 and 1) associated
16
with each section by The Echo Nest)
sections_start shape = (10) (start time of each section according to The Echo
Nest this song has 10 sections)
segments_confidence shape = (935) (confidence value (between 0 and 1) asso-
ciated with each segment by The Echo Nest)
segments_loudness_max shape = (935) (max loudness during each segment)
segments_loudness_max_time shape = (935) (time of the max loudness
during each segment)
segments_loudness_start shape = (935) (loudness at the beginning of each
segment)
segments_pitches shape = (935 12) (chroma features for each segment (normal-
ized so max is 1))
segments_start shape = (935) (start time of each segment ( musical event or
onset) according to The Echo Nest this song has 935 segments)
segments_timbre shape = (935 12) (MFCC-like features for each segment)
similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar
to Rick Astley according to The Echo Nest)
song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded
(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))
song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can
be associated with many tracks (with very slight audio differences))
start_of _fade _out 198536 (start time of the fade out in seconds at the end
of the song according to The Echo Nest)
tatums_confidence shape = (794) (confidence value (between 0 and 1) associated
with each tatum by The Echo Nest)
tatums_start shape = (794) (start time of each tatum according to The Echo
Nest this song has 794 tatums)
17
tempo 113359 (tempo in BPM according to The Echo Nest)
time_signature 4 (time signature of the song according to The Echo Nest ie
usual number of beats per bar)
time_signature_confidence 0634 (confidence of the time signature estimation)
title Never Gonna Give You Up (song title)
track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)
track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track
on which the analysis was done) year 1987 (year when this song was released
according to musicbrainzorg)
When choosing features my main goal was to use features that would most
likely yield meaningful results yet also be simple and make sense to the average
person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener
will have his or her opinions to what constitutes different types of music but some
common features most people tend to differentiate songs by are pitch rhythm and
the types of instruments used The following specific fields provided in each song
object fall under these three terms
Pitch
bull segments_pitches a matrix of values indicating the strength of each pitch (or
note) at each discernible time interval
Rhythm
bull beats_start a vector of values indicating the start time of each beat
bull time_signature the time signature of the song
bull tempo the speed of the song in Beats Per Minute (BPM)
Instruments
18
bull segments_timbre a matrix of values indicating the distribution of MFCC-like
features (different types of tones) for each segments
The segments_pitches feature is a clear candidate for a differentiating factor for
songs since it reveals patterns of notes that occur Additionally other research
papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a
procedure that allows all songs to be compared with the same metric Likewise
timbre is intuitively a reliable differentiating feature since it reveals the amount
that different tones or sounds that sound different despite having the same pitch
Therefore segments_timbre is another feature that is considered in each song
Finally we look at the candidate features for rhythm At first glance all of these
features appear to be useful as they indicate the rhythm of a song in one way or
another However none of these features are as useful as the pitch and timbre
features While tempo is one factor in differentiating genres of EDM and music in
general tempo alone is not a driving force of musical innovation Certain genres
of EDM like drum nrsquo bass and happycore stand out for having very fast tempos
but the tempo is supplemented with a sound unique to the genre Conceiving new
arrangements of pitches combining instruments in new ways and inventing new
types of sounds are novel but speeding up or slowing down existing sounds is not
Including tempo as a feature could actually add noise to the model since many genres
overlap in their tempos And finally tempo is measured indirectly when the pitch
and timbre features are normalized for each song everything is measured in units of
ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features
each second Time signature can be dismissed from the candidates features for the
same reason as tempo many genres contain the same time signature and including
it in the feature set would only add more noise beats_start looks like a more
promising feature since like segments_pitches and segments_timbre it consists of
a vector of values However difficulties arise when we begin to think how exactly
19
we can utilize this information Since each song varies in length we need a way to
compare songs of different durations on the same level One approach could be to
perform basic statistics on the distance between each beat for example calculating
the mean and standard deviation of this distance However the normalized pitch
and timbre information already capture this data Another possibility is detecting
certain patterns of beats which could differentiate the syncopated dubstep or glitch
music beats from the steady pulse of electro-house But once again every beat is
accompanied by a sound with a specific timbre and pitch so this feature would not
add any significantly new information
23 Collecting Data and Preprocessing Selected Fea-
tures
231 Collecting the Data
Upon deciding the features I wanted to use in my research I first needed to collect
all of the electronic songs in the Million Song Dataset The easiest reliable way to
achieve this was to iterate through each song in the database and save the information
for the songs where any of the artist genre tags in artist_mbtags matched with an
electronic music genre While this measure was not fully accurate because it looks at
the genre of the artist not the song specific genre information for each song was not
as easily accessible so this indicator was nearly as good a substitute To generate a
list of the genres that electronic songs would fall under I manually searched through
a subset of the MSD to find all genres that seemed to be releated to electronic music
In the case of genres that were sometimes but not always electronic in nature such
20
as disco or pop I erred on the side of caution and did not include them in the list
of electronic genres In these cases false positives such as primarily rock songs that
happen to have the disco label attached to the artist could inadvertantly be included
in the dataset The final list of genres is as follows
target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo
rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo
rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo
rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
rsquodance and electronicarsquorsquoelectronicrsquo]
232 Pitch Preprocessing
A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-
cally informed manner The study first takes the raw sound data and converts it into
a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest
amount Then it computes the most likely chord by comparing comparing the 4 most
common types of chords in popular music (major minor dominant 7 and minor 7) to
the observed chord The most common chords are represented as ldquotemplate chordsrdquo
and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For
example using the note C as the first index the C major chord is represented as
CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)
For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is
computed over every template chord
ρCTc =12sumi=1
(CT minus CTi)(ci minus c)σCTσc
21
where CT is the mean of the values in the template chord σCT is the standard
deviation of the values in the chord and the operations on c are analogous Note
that the summation is over each individual pitch in the 12 pitch classes The chord
template with the highest value of ρ is selected as the chord for the time frame
After this is performed for each time frame the values are smoothed and then the
change between adjacent chords is observed The reasoning behind this step is that
by measuring the relative distance between chords rather than the chords themselves
all songs can be compared in the same manner even though they may have different
key signatures Finally the study takes the types of chord changes and classifies
them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted
versions of the chord changes that make more sense to a human such as ldquochanges
involving dominant 7th chordsrdquo
In my preliminary implementation of this method on an electronic dance music
corpus I made a few modifications to Mauchrsquos study First I smoothed out time
frames before computing the most probable chords rather than smoothing the most
probable chords I did this to save time and to reduce volatility in the chord
measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference
which contains 935 time frames and lasts 212 seconds 5 time frames is slightly
under 1 second and for preliminary testing appeared to be a good interval for each
time block Second as mentioned in the literature section I did not abstract the
chord changes into H-topics This decision also stemmed from time constraints since
deriving semantic chord meaning from EDM songs would require careful research
into the types of harmonies and sounds common in that genre of music Below I
included a high-level visualization of the pitch metadata found in a sample song
ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change
vector that I could then feed into the Dirichlet Process algorithm
22
Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813
CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513
DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913
FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713
GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213
AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513
13 13 13 13
060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013
13 13 13 13 13 13 13 13 13 13 13 13
13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13
Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13
Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13
Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13
Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13
23
13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13
13
13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13
Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13
24
A final step I took to normalize the chord change data was to divide the numbers by
the length of the song so that each songrsquos number of chord changes was measured per
second
233 Timbre Preprocessing
For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre
uniformly across all songs [8] After collecting all song metadata I took a random
sample of 20 songs from each year starting at 1970 The reason I forced the sampling
to 20 randomly sampled songs from each year and did not take a random sample of
songs from all years at once was to prevent bias towards any type of sounds As seen
in figure 22 there are significantly more songs from 2000-2011 than before 2000 The
mean year is x = 2001052 the median year is 2003 and the standard deviation of the
years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include
a disproportionate amount of more recent songs In order to not miss out on sounds
that may be more prevalent in older songs I required a set number of songs from each
year Next from each randomly selected song I selected 20 random timbre frames
in order to prevent any biases in data collection within each song In total there
were 422020 = 16800 timbre frames collected Next I clustered the timbre frames
using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to
100 and selecting the number of clusters with the lowest Bayes Information Criterion
(BIC) a statistical measure commonly used to calculate the best fitting model The
BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters
and saved the mean values of each of the 12 timbre segments for each cluster formed
In the same way that every song had the same 192 cord changes whose frequencies
could be compared between songs each song now had the same 46 timbre clusters
but different frequencies in each song When reading in the metadata from each song
I calculated the most likely timbre cluster each timbre frame belonged to and kept
25
Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear
a frequency count of all of the possible timbre clusters observed in a song Finally
as with the pitch data I divided all observed counts by the duration of the song in
order to normalize each songrsquos timbre counts
26
Chapter 3
Results
31 Methodology
After the pitch and timbre data was processed I ran the Dirichlet Process on the
data For each song I concatenated the 192-element chord change frequency list
and the 46-element timbre category frequency list giving each song a total of 238
features However there is a problem with this setup The pitch data will inherently
dominate the clustering process since it contains almost 3 times as many features
as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to
give different weights to each feature I considered another possibility to remedy
this discrepancy duplicating the timbre vector a certain number of times and
concatenating that to the feature set of each song While this strategy runs the risk
of corrupting the feature set and turning it into something that does not accurately
represent each song it is important to keep in mind that even without duplicating
the timbre vector the feature set consists of two separate feature sets concatenated
to each other Therefore timbre duplication appears to be a reasonable strategy to
weigh pitch and timbre more evenly
27
After this modification I tweaked a few more parameters before obtaining my
final results Dividing the pitch and timbre frequencies by the duration of the song
normalized every song to frequency per second but it also had the undesired effect
of making the data too small Timbre and pitch frequencies per second were almost
always less than 10 and many times hovered as low as 0002 for nonzero values
Because all of the values were very close to each other using common values of α
in the range of 01 to 1000-2000 was insufficient to push the songs into different
clusters As a result every song fell into the same cluster Increasing the value
of α by several orders of magnitude to well over 10 million fixed the problem but
this solution presented two problems First tuning α to experiment with different
ways to cluster the music would be problematic since I would have to work with
an enormous range of possible values for α Second pushing α to such high values
is not appropriate for the Dirichlet Process Extremely high values of α indicate a
Dirichlet Process that will try to disperse the data into different clusters but a value
of α that high is in principle always assigning each new song to a new cluster On
the other hand varying α between 01 and 1000 for example presents a much wider
range of flexibility when assigning clusters While this may be possible by varying
the values of α an extreme amount with the data as it currently is we are using
the Dirichlet Process in a way it should mathematically not be used Therefore
multiplying all of the data by a constant value so that we can work in the appropriate
range of α is the ideal approach After some experimentation I found that k=10 was
an appropriate scaling factor After initial runs of the Dirichlet Process I found out
that there was a slight issue with some of the earlier songs Since I had only artist
genre tags not specific song tags for each song I chose songs based on whether any
of the tags associated with the artist fell under any electronic music genre including
the generic term rsquoelectronicrsquo There were some bands mostly older ones from the
1960s and 1970s like Electric Light Orchestra which had some electronic music but
28
mostly featured rock funk disco or another genre Given that these artists featured
mostly non-electronic songs I decided to exclude them from my study and generate
a blacklist indicating these music artists While it was infeasible to look through
every single song and determine whether it was electronic or not I was able to look
over the earliest songs in each cluster These songs were the most important to verify
as electronic because early non-electronic songs could end up forming new clusters
and inadvertently create clusters with non-electronic sounds that I was not looking for
The goal of this thesis is to identify different groups in which EM songs are
clustered and identify the most unique artists and genres While the second task is
very simple because it requires looking at the earliest songs in each cluster the first
is difficult to gauge the effectiveness of While I can look at the average chord change
and timbre category frequencies in each category as well as other metadata putting
more semantic interpretations to what the music actually sounds like and determining
whether the music is clustered properly is a very subjective process For this reason
I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and
compared the clustering in each category examining similarities and differences in
the clusters formed in each scenario in the Discussion section For each value α I
set the upper limit of components or clusters allowed to 50 The ranges of α I used
resulted in 9 14 and 19 clusters formed
32 Findings
321 α=005
When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are
the distribution of years of the songs in each cluster (note that the Dirichlet Process
29
does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are
skipped)
30
Figure 31 Song year distributions for α = 005
For each value of α I also calculated the average frequency of each chord change
category and timbre category for each cluster and plotted the results The green
lines correspond to timbre and the blue lines to pitch
31
Figure 32 Timbre and pitch distributions for α = 005
A table of each cluster formed the number of songs in that cluster and descriptions
of pitch timbre and rhythmic qualities characteristic of songs in that cluster are
shown below
32
Cluster Song Count Characteristic Sounds
0 6481 Minimalist industrial space sounds dissonant chords
1 5482 Soft New Age ethereal
2 2405 Defined sounds electronic and non-electronic instru-
ments played in standard rock rhythms
3 360 Very dense and complex synths slightly darker tone
4 4550 Heavily distorted rock and synthesizer
6 2854 Faster paced 80s synth rock acid house
8 798 Aggressive beats dense house music
9 1464 Ambient house trancelike strong beats mysterious
tone
11 1597 Melancholy tones New wave rock in 80s then starting
in 90s downtempo trip-hop nu-metal
Table 31 Song cluster descriptions for α = 005
322 α=01
A total of 14 clusters were formed (16 were formed but 2 clusters contained only
one song each I listened to both of these songs and they did not sound unique so
I discarded them from the clusters) Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
33
34
Figure 33 Song year distributions for α = 01
35
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
List of Figures
11 A userrsquos taste profile generated by Spotify 4
12 Data processing pipeline for Mauchrsquos study illustrated with a segment
of Queenrsquos Bohemian Rhapsody 1975 8
21 scikit-learn example of GMM vs DPGMM and tuning of α 15
22 Number of Electronic Music Songs in Million Song Dataset from Each
Year 26
31 Song year distributions for α = 005 31
32 Timbre and pitch distributions for α = 005 32
33 Song year distributions for α = 01 35
34 Timbre and pitch distributions for α = 01 37
35 Song year distributions for α = 02 41
36 Timbre and pitch distributions for α = 02 44
ix
Chapter 1
Introduction
11 Background Information
Electronic Music (EM) is an increasingly popular genre of music with an immense
presence and influence on modern culture Because the genre is new as a whole and
is arguably more loosely structured than other genres - technology has enabled the
creation of a wide range of sounds and easy blending of existing and new sounds alike
- formal analysis especially mathematical analysis on the genre is fairly limited and
has only begun growing in the past few years As a fan of EM I am interested in
exploring how the genre has evolved over time More specifically my goal with this
project was to design some structure or model that could help me identify which EM
artists have contributed the most stylistically to the genre Oftentimes famous EM
artists do not create novel-sounding music but rather popularize an existing style
and the motivation of this study is to understand who has stylistically contributed
the most to the EM scene versus those who have merely popularized aspects of it
As the study progressed the manner in which I constructed my model lent to
a second goal of the thesis imagining new ways in which we can imagine EM genres
1
While there exists an extensive amount of research analyzing music trends from
a non-mathematical (cultural societal artistic) perspective the analysis of EM
from a mathematical perspective and especially with respect to any computationally
measurable trends in the genre is close to nonexistent EM has been analyzed to a
lesser extent than other common genres of music in the academic world most likely
due to existing for a shorter amount of time and being less rooted in prominant
social and cultural events In fact the first published reference work on EM did not
exist until 2012 when Professor Mark J Butler from Northwestern University edited
and published Electronica Dance and Club Music a collection of essays exploring
EM genres and culture [1] Furthermore there are very few comprehensive visual
guides that allow a user to relate every genre to each other and easily observe how
different genres converge and diverge While conducting research the best guide I
found was not a scholarly source but an online guide created by an EM enthusiast
Ishkurrsquos Guide to Electronic Music [2] This guide which includes over 100 specific
genres grouped by more general genres and represents chronological evolutions by
connecting each genre in a flowchart is the most exhaustive analysis of the EM scene
I could find However the guidersquos analysis is very qualitative While each subgenre
contains an explanation on typical rhythm and sounds and includes well-known
songs indicative of the style the guide is created by someone who used historical and
personal knowledge of EM My model which creates music genres by chronologically
ordering songs and then assigning them to clusters is a different approach towards
imagining the entire landscape of EM The results may confirm Ishkurrsquos Guidersquos
findings in which case his guide is given additional merit with mathematical evi-
dence or it may be different suggesting that there may be better ways to group EM
genres One advantage that guides such as Ishkurrsquos and historically-based scholarly
works have over my approach is that those models are history-sensitive and therefore
may group songs in a way that historically makes sense On the other hand my
2
model is history-agnostic and may not realize the historical context of songs when
clustering However I believe that there is still significant merit to my research
Instead of classifying genres of music by early genres that led to them my approach
gives the most credit to the artists and songs that were the most innovative for their
time and perhaps reveal different musical styles that are more similar to each other
than history would otherwise imply This way of thinking of music genres while
unconventional is another way of imagining EM
The practice of quantitatively analyzing music has exploded in the last decade
thanks to technological and algorithmic advances that allow data scientists to con-
structively sift through troves of music and listener information In the literature
review I will focus on two particular organizations that have contributed greatly to
the large-scale mathematical analysis of music Pandora a website that plays songs
similar to a songartistalbum inputted by a user and Echo Nest a music analytics
firm that was acquired by Spotify in 2014 and drives Spotifyrsquos Discover Weekly
feature [3] After evaluating the relevance of these sources to my thesis work I will
then look over the relevant academic research and evaluate what this research can
contribute
12 Literature Review
The analysis of quantitative music generally falls into two categories research con-
ducted by academics and academic organizations for scholarly purposes and research
conducted by companies and primarily targeted for consumers First looking at the
consumer-based research Spotify and Pandora are two of the most prominent based
groups and the two I decided to focus on Spotify is a music streaming service where
users can listen to albums and songs from a wide variety of artists or listen to weekly
3
playlists generated based on the music the user and userrsquos friends have listened to
The weekly playlist called Discover Weekly Playlist is a relatively new feature in
Spotify and is driven by music analysis algorithms created from Echo Nest Using
the Echo Nest code interface Spotify creates a ldquotaste profilerdquo for each user which
assesses attributes such as how often a user branches out to new styles of music how
closely the userrsquos music streamed follows popular Billboard music charts and so on
Spotify also looks at the artists and songs the user streamed and creates clusters
of different genres that the user likes (see figure 11) The taste profile and music
clusters can then be used to generate playlists geared to a specific user The genres
in the cluster come from a list of nearly 800 names which are derived by scraping
the Internet for trending terms in music as well as training various algorithms on a
regular basic by ldquolisteningrdquo to new songs [4][5]
Figure 11 A userrsquos taste profile generated by Spotify
4
Although Spotify and Echo Nestrsquos algorithms are very useful for mapping the land-
scape of established and emerging genres of music the methodology is limited to
pre-defined genres of music This may serve as a good starting point to compare my
final results to but my study aims to be as context-free as possible by attaching no
preconceived notions of music styles or genres instead looking at features that could
be measured in every song
While Spotifyrsquos approach to mapping music is very high-tech and based on ex-
isting genres Pandora takes a very low-tech and context-free approach to music
clustering Pandora created the Music Genome Project a multi-year undertaking
where skilled music theorists listened to a large number of songs and analyzed up to
450 characteristics in each song [6] Pandorarsquos approach is appealing to the aim of
my study since it does not take any preconceived notions of what a genre of music
is instead comparing songs on common characteristics such as pitch rhythm and
instrument patterns Unfortunately I do not have a cadre of skilled music theorists
at my disposal nor do I have 10 years to perform such calculations like the dedicated
workers at Pandora (tips the indestructible fedora) Additionally Pandorarsquos Music
Genome Project is intellectual property so at best I can only rely on the abstract
concepts of the Music Genome Project to drive my study
In the academic realm there are no existing studies analyzing quantifiable changes in
EM specifically but there exist a few studies that perform such analysis on popular
Western music in general One such study is Measuring the Evolution of Contem-
porary Western Popular Music which analyzes music from 1955-2010 spanning all
common genres Using the Million Song Dataset a free public database of songs
each containing metadata (see section 13) the study focuses on the attributes pitch
timbre and loudness Pitch is defined as the standard musical notes or frequency of
5
the sound waves Timbre is formally defined as the Mel frequency cepstral coefficients
(MFCC) of a transformed sound signal More informally it refers to the sound color
texture or tone quality and is associated with instrument types recording resources
and production techniques In other words two sounds that have the same pitch
but different tones (for example a bell and voice) are differentiated by their timbres
There are 12 MFCCs that define the timbre of a given sound Finally loudness
refers to intrinsically how loud the music sounds not loudness that a listener can
manipulate while listening to the music Loudness is the first MFCC of the timbre
of a sound [7] The study concluded that over time music has been becoming louder
and less diverse
The restriction of pitch sequences (with metrics showing less variety inpitch progressions) the homogenization of the timbral palette (with fre-quent timbres becoming more frequent) and growing average loudnesslevels (threatening a dynamic richness that has been conserved until to-day) This suggests that our perception of the new would be essentiallyrooted on identifying simpler pitch sequences fashionable timbral mix-tures and louder volumes Hence an old tune with slightly simpler chordprogressions new instrument sonorities that were in agreement with cur-rent tendencies and recorded with modern techniques that allowed forincreased loudness levels could be easily perceived as novel fashionableand groundbreaking
This study serves as a good starting point for mathematically analyzing music in
a few ways First it utilizes the Million Song Dataset which addresses the issue
of legally obtaining music metadata As mentioned in section 13 the only legal
way to obtain playable music for this study would have been to purchase all songs I
would include which is infeasible While the Million Song Dataset does not contain
the audio files in playable format it does contain audio features and metadata that
allow for in-depth analysis In addition working with the dataset takes out the
work of extracting features from raw audio files saving an extensive amount of time
and energy Second the study establishes specifics for what constitutes a trend
in music Pitch timbre and loudness are core features of music and examining the6
distributions of each among songs over time reveals a lot of information about how
the music industry and consumersrsquo tastes have evolved While these are not all of the
features contained in a song they serve as a good starting point Third the study
defines mathematical ways to capture music attributes and measure their change
over time For example pitches are transposed into the same tonal context with
binary discretized pitch descriptions based on a threshold so that each song can be
represented with vectors of pitches that are normalized and compared to other songs
While this study lays some solid groundwork for capturing and analyzing nu-
meric qualities of music it falls short of addressing my goals in a couple of ways
First it does not perform any analysis with respect to music genre While the
analysis performed in this paper could easily be applied to a list of songs in a specific
genre certain genres might have unique sounds and rhythms relative to other genres
that would be worth studying in greater detail Second the study only measures
general trends in music over time The models used to describe changes are simple
regressions that donrsquot look at more nuanced changes For example what styles of
music developed over certain periods of time How rapid were those changes Which
styles of music developed from which other styles
A more promising study led by music researcher Matthias Mauch [8] analyzes
contemporary popular Western Music from the 1960s to 2010s by comparing numer-
ical data on the pitches and timbre of a corpus of 17000 songs that appeared on the
Billboard Hot 100 Like the previously mentioned paper Measuring the Evolution
of Contemporary Western Popular Music Mauchrsquos study also creates abstractions
of pitch and timbre in order to provide a consistent and meaningful semantic inter-
pretation of musical data (see figure 12) However Mauchrsquos study takes this idea a
step further by using genre tags from Lastfm a music website and constructing a
7
hierarchy of music genres using hierarchical clustering Additionally the study takes
a crack at determining whether a particular band the Beatles was musically ground-
breaking for its time or merely playing off sounds that other bands had already used
Figure 12 Data processing pipeline for Mauchrsquos study illustrated with a segment ofQueenrsquos Bohemian Rhapsody 1975
While both Measuring the Evolution of Contemporary Western Popular Music
and Mauchrsquos study created abstractions of pitch and timbre Mauchrsquos study is more
appealing with respect to my goal because its end results align more closely with
mine Additionally the data processing pipeline offers several layers of abstraction
8
and depending on my progress I would be able to achieve at least one of the levels of
abstraction As shown in figure 12 each segment of a raw audio file is first broken
down into its 12 timbre MFCCs and pitch components Next the study constructs
ldquolexiconsrdquo or a dictionary of pitch and timbre terms that all songs can be compared
to For pitch the original data is in a N-by-12 matrix where N is the number of time
segments in the song and 12 the number of each of the notes found in an octave of
pitches Each time segment contains the relative strengths of each of the 12 pitches
However music sounds are not merely a collection of pitches but more precisely
chords Furthermore the similarity of two songs is not determined by the absolute
pitches of their chords but rather the progression of chords in the song all relative to
each other For example if all the notes in a song are transposed by one step the song
will sound different in terms of absolute pitch but the song will still be recognized
as the original because all of the relative movements from each chord to the next
are the same This phenomenon is captured in the pitch data by finding the most
likely chord played at each time segment then counting the change to the next chord
at each time step and generating a table of chord change frequencies for each song
Constructing the timbre lexcion is more complicated since there is no easy analogue
like chords for pitches to compare songs Mauchrsquos study utilizes a Gaussian Mixture
Model (GMM) by iterating over k=1 to k=N clusters where N is a large number
running the GMM on each prior assumption of k clusters and computing the Bayes
Information Criterion (BIC) for each model The lowest of the N BIC values is found
and that value of k is selected That model contains k different timbre clusters
and each cluster contains the mean timbre value for each of the 12 timbre components
For my research I decided that the pitch and timbre lexicons would be the most
realistic level of abstraction I could obtain Mauchrsquos study adds an addtional layer
to pitch and timbre by identifying the most common patterns of chord changes and
9
most common timbre rhythms and creating more general tags from these combined
terms such as ldquo stepwise changes indicating modal harmonyrdquo for a pitch topic and
ldquooh rounded mellowrdquo for a timbral topic There were two problems with using this
final layer of abstraction for my study First attaching semantic interpretations to
the pitch and timbral lexicons is a difficult task For timbre I would need to listen
to sound samples containing all of the different timbral categories I identified and
attaching user interpretations to them For the chords not only would I have to
perform the same analysis as on timbre but take careful attention to identify which
chords correspond to common sound progressions in popular music a task that I am
not qualified for an did not have the resources for this thesis to seek out Second
this final layer of abstraction was not necessary for the end goal of my paper In
fact consolidating my pitch and timbre lexicons into simpler phrases would run the
risk of pigeonholing my analysis and preventing me from discovering more nuanced
patterns in my final results Therefore I decided to focus on pitch and timbral
lexicon construction as the furthest levels of abstraction when processing songs for
my thesis Mathematical details on how I constructed the lexical and timbral lexicons
can be found in the Mathematical Modeling section of this paper
13 The Dataset
In order to successfully execute my thesis I need access to an extensive database of
music Until recently acquiring a substantial corpus of music data was a difficult and
costly task It is illegal to download music audio files from video and music-sharing
sites such as YouTube Spotify and Pandora Some platforms such as iTunes offer
90-second previews of songs but using only segments of songs and usually segments
that showcase the chorus of the song are not reliable measures to capture the entire
essence of a song Even if I were to legally download entire audio files for free I would
10
run into additional issues Obtaining a high-quality corpora of song data would be
challenging writing scripts that crawl music sharing platforms may not capture all of
the music I am looking for And once I have the audio files I would have to perform
audio processing techniques to extract the relevant information from the songs
Fortunately there is an easy solution to the music data acquisition problem
The Million Song Dataset (MSD) is a collection of metadata for one million music
tracks dating up to 2011 Various organizations such as The Echo Nest Musicbrainz
7digital and Lastfm have contributed different pieces of metadata Each song is
represented as a Hierarchical Data Format file (HDF5) which can be loaded as a
JSON object The fields encompass topical features such as the song title artist
and release date as well as lower-level features such as the loudness starting beat
time pitches and timbre of several segments of the song [9] While the MSD is
the largest free and open source music metadata dataset I could find there is no
guarantee that it adequately covers the entire spectrum of EM artists and songs
This quality limitation is important to consider throughout the study A quick look
through the songs including the subset of data I worked with for this report showed
that there were several well-known artists and songs in the EM scene Therefore
while the MSD may not contain all desired songs for this project it contains an
adequate number of relevant songs to produce some meaningful results Additionally
laying the groundwork for modeling the similarities between songs and identifying
groundbreaking ones is the same regardless of the songs included and the following
methodologies can be implemented on any similarly-formatted dataset including one
with songs that might currently be missing in the MSD
11
Chapter 2
Mathematical Modeling
21 Determining Novelty of Songs
Finding an logical and implementable mathematical model was and continues to be
an important aspect of my research My problem how to mathematically determine
which songs were unique for their time requires an algorithm in which each song is
introduced in chronological order either joining an existing category or starting a
new category based on its musical similarity to songs already introduced Clustering
algorithms like k-means or Gaussian Mixture Models (GMM) which have a prede-
termined number of clusters and optimize the partitioning of a dataset into those
cluster assume a fixed number of clusters While this process would work if we knew
exactly how many genres of EM existed if we guess wrong our end results may end
up with clusters that are wrongly grouped together or separated It is much better to
apply a clustering algorithm that does not make any assumptions about this number
One particularly promising process that addresses the issue of tthe number of
clusters is a family of algorithms known as Dirichlet Proccesses (DPs) DPs are
useful for this particular application because (1) they assign clusters to a dataset
12
with only an upper bound on the number of clusters and (2) by sorting the songs
in chronological order before running the algorithm and keeping track of which
songs are categorized under each cluster we can observe the earliest songs in each
cluster and consequentially infer which songs were responsible for creating new
clusters The arguments for the DP The DP is controlled by a parameter α which
is the concentration parameter The expected number of clusters formed is directly
proportional to the value of α so the higher the value of α the more likely new
clusters will be formed [10] Regardless of the value of α as the number of data
points introduced increases the probability of a new group being formed decreases
That is a ldquorich get richerrdquo policy is in place and existing clusters tend to grow in
size Tweaking the value of the tunable parameter α is an important part of the
study since it determines the flexibility given to forming a new cluster If the value
of α is too small then the criteria for forming clusters will be too strict and data
that should be in different clusters will be assigned to the same cluster On the other
hand if α is too large the algorithm will be too sensitive and assign similar songs to
different clusters
The implementation of the DP was achieved using scikit-learnrsquos library and API for
Dirichlet Process Gaussian Mixture Model (DPGMM) The DPGMM is the formal
name of the Dirichlet Process model used to cluster the data More specifically
scikit-learnrsquos implementation of the DPGMM uses the Stick Breaking method
one of several equally valid methods to assign songs to clusters [11] While the
mathematical details for this algorithm can be found at the following citation [12]
the most important aspects of the DPGMM are the arguments that the user can
specify and tune The first of these tunable parameters is the value α which is the
same parameter as the α discussed in the previous paragraph As seen in Figure 21
on the right side properly tuning α is key to obtaining meaningful clusters The
13
center image has α set to 001 which is too small and results in all of the data being
formed under one cluster On the other hand the bottom-right image has the same
data set and α set to 100 which does a better job of clustering On a related note
the figure also demonstrates the effectiveness of the DPGMM over the GMM On the
left side clearly the dataset contains 2 clusters but the GMM on the top-left image
assumes 5 clusters as a prior and consequentially clusters the data incorrectly while
the DPGMM manages to limit the data to 2 clusters
The second argument that the user inputs for the DPGMM is the data that
will be clustered The scikit-learn implementation takes the data in the format
of a nested list (N lists each of length m) where N is the number of data points
and m the number of features While the format of the data structure is relatively
straightforward choosing which numbers should be in the data was a challenge I
faced Selecting the relevant features of each song to be used in the algorithm will
be expounded upon in the next section ldquoFeature Selectionrdquo
The last argument that a user inputs for the scikit-learn DGPMM implementa-
tion is an argument indicating the upper bound for the number of clusters The
Dirichlet Process then determines the best number of clusters for the data between
1 and the upper bound Since the DPGMM is flexible enough to find the best value
I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to
modify the number of clusters formed
22 Feature Selection
One of the most difficult aspects of the Dirichlet method is choosing the features
to be used for clustering In other words when we organize the songs into clusters
14
Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α
we need to ensure that each cluster is distinct in a way that is statistically and
intuitively logical In the Million Song Dataset [9] each song is represented as a
JSON object containing several fields These fields are candidate features to be used
in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo
by Rick Astley and the corresponding features
artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID
for this artists is db9)
artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)
artist_mbtags_count shape = (4)
(raw tag count of the 4 tags this artist received on musicbrainzorg)
artist_name Rick Astley (artist name)
artist_playmeid 1338 (the ID of that artist on the service playmecom)
artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)
artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest
(number between 0 and 1))
artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest
(number between 0 and 1))
15
audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for
the analysis by The Echo Nest)
bars_confidence shape = (99) (confidence value (between 0 and 1) associated
with each bar by The Echo Nest)
bars_start shape = (99) (start time of each bar according to The Echo Nest this
song has 99 bars)
beats_confidence shape = (397))
confidence value (between 0 and 1) associated with each beat by The Echo Nest
beats_start shape = (397) (start time of each beat according to The Echo Nest
this song has 397 beats)
danceability 00 (danceability measure of this song according to The Echo Nest
(between 0 and 1 0 =gt not analyzed))
duration 21169587 (duration of the track in seconds)
end_of_fade_in 0139 (time of the end of the fade in at the beginning of the
song according to The Echo Nest)
energy 00 (energy measure (not in the signal processing sense) according to The
Echo Nest (between 0 and 1 0 = not analyzed))
key 1 (estimation of the key the song is in by The Echo Nest)
key_confidence 0324 (confidence of the key estimation)
loudness -775 (general loudness of the track)
mode 1 (estimation of the mode the song is in by The Echo Nest)
mode_confidence 0434 (confidence of the mode estimation)
release Big Tunes - Back 2 The 80s (album name from which the track was taken
some songs tracks can come from many albums we give only one)
release_7digitalid 786795 (the ID of the release (album) on the service 7digi-
talcom)
sections_confidence shape = (10) (confidence value (between 0 and 1) associated
16
with each section by The Echo Nest)
sections_start shape = (10) (start time of each section according to The Echo
Nest this song has 10 sections)
segments_confidence shape = (935) (confidence value (between 0 and 1) asso-
ciated with each segment by The Echo Nest)
segments_loudness_max shape = (935) (max loudness during each segment)
segments_loudness_max_time shape = (935) (time of the max loudness
during each segment)
segments_loudness_start shape = (935) (loudness at the beginning of each
segment)
segments_pitches shape = (935 12) (chroma features for each segment (normal-
ized so max is 1))
segments_start shape = (935) (start time of each segment ( musical event or
onset) according to The Echo Nest this song has 935 segments)
segments_timbre shape = (935 12) (MFCC-like features for each segment)
similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar
to Rick Astley according to The Echo Nest)
song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded
(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))
song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can
be associated with many tracks (with very slight audio differences))
start_of _fade _out 198536 (start time of the fade out in seconds at the end
of the song according to The Echo Nest)
tatums_confidence shape = (794) (confidence value (between 0 and 1) associated
with each tatum by The Echo Nest)
tatums_start shape = (794) (start time of each tatum according to The Echo
Nest this song has 794 tatums)
17
tempo 113359 (tempo in BPM according to The Echo Nest)
time_signature 4 (time signature of the song according to The Echo Nest ie
usual number of beats per bar)
time_signature_confidence 0634 (confidence of the time signature estimation)
title Never Gonna Give You Up (song title)
track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)
track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track
on which the analysis was done) year 1987 (year when this song was released
according to musicbrainzorg)
When choosing features my main goal was to use features that would most
likely yield meaningful results yet also be simple and make sense to the average
person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener
will have his or her opinions to what constitutes different types of music but some
common features most people tend to differentiate songs by are pitch rhythm and
the types of instruments used The following specific fields provided in each song
object fall under these three terms
Pitch
bull segments_pitches a matrix of values indicating the strength of each pitch (or
note) at each discernible time interval
Rhythm
bull beats_start a vector of values indicating the start time of each beat
bull time_signature the time signature of the song
bull tempo the speed of the song in Beats Per Minute (BPM)
Instruments
18
bull segments_timbre a matrix of values indicating the distribution of MFCC-like
features (different types of tones) for each segments
The segments_pitches feature is a clear candidate for a differentiating factor for
songs since it reveals patterns of notes that occur Additionally other research
papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a
procedure that allows all songs to be compared with the same metric Likewise
timbre is intuitively a reliable differentiating feature since it reveals the amount
that different tones or sounds that sound different despite having the same pitch
Therefore segments_timbre is another feature that is considered in each song
Finally we look at the candidate features for rhythm At first glance all of these
features appear to be useful as they indicate the rhythm of a song in one way or
another However none of these features are as useful as the pitch and timbre
features While tempo is one factor in differentiating genres of EDM and music in
general tempo alone is not a driving force of musical innovation Certain genres
of EDM like drum nrsquo bass and happycore stand out for having very fast tempos
but the tempo is supplemented with a sound unique to the genre Conceiving new
arrangements of pitches combining instruments in new ways and inventing new
types of sounds are novel but speeding up or slowing down existing sounds is not
Including tempo as a feature could actually add noise to the model since many genres
overlap in their tempos And finally tempo is measured indirectly when the pitch
and timbre features are normalized for each song everything is measured in units of
ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features
each second Time signature can be dismissed from the candidates features for the
same reason as tempo many genres contain the same time signature and including
it in the feature set would only add more noise beats_start looks like a more
promising feature since like segments_pitches and segments_timbre it consists of
a vector of values However difficulties arise when we begin to think how exactly
19
we can utilize this information Since each song varies in length we need a way to
compare songs of different durations on the same level One approach could be to
perform basic statistics on the distance between each beat for example calculating
the mean and standard deviation of this distance However the normalized pitch
and timbre information already capture this data Another possibility is detecting
certain patterns of beats which could differentiate the syncopated dubstep or glitch
music beats from the steady pulse of electro-house But once again every beat is
accompanied by a sound with a specific timbre and pitch so this feature would not
add any significantly new information
23 Collecting Data and Preprocessing Selected Fea-
tures
231 Collecting the Data
Upon deciding the features I wanted to use in my research I first needed to collect
all of the electronic songs in the Million Song Dataset The easiest reliable way to
achieve this was to iterate through each song in the database and save the information
for the songs where any of the artist genre tags in artist_mbtags matched with an
electronic music genre While this measure was not fully accurate because it looks at
the genre of the artist not the song specific genre information for each song was not
as easily accessible so this indicator was nearly as good a substitute To generate a
list of the genres that electronic songs would fall under I manually searched through
a subset of the MSD to find all genres that seemed to be releated to electronic music
In the case of genres that were sometimes but not always electronic in nature such
20
as disco or pop I erred on the side of caution and did not include them in the list
of electronic genres In these cases false positives such as primarily rock songs that
happen to have the disco label attached to the artist could inadvertantly be included
in the dataset The final list of genres is as follows
target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo
rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo
rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo
rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
rsquodance and electronicarsquorsquoelectronicrsquo]
232 Pitch Preprocessing
A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-
cally informed manner The study first takes the raw sound data and converts it into
a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest
amount Then it computes the most likely chord by comparing comparing the 4 most
common types of chords in popular music (major minor dominant 7 and minor 7) to
the observed chord The most common chords are represented as ldquotemplate chordsrdquo
and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For
example using the note C as the first index the C major chord is represented as
CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)
For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is
computed over every template chord
ρCTc =12sumi=1
(CT minus CTi)(ci minus c)σCTσc
21
where CT is the mean of the values in the template chord σCT is the standard
deviation of the values in the chord and the operations on c are analogous Note
that the summation is over each individual pitch in the 12 pitch classes The chord
template with the highest value of ρ is selected as the chord for the time frame
After this is performed for each time frame the values are smoothed and then the
change between adjacent chords is observed The reasoning behind this step is that
by measuring the relative distance between chords rather than the chords themselves
all songs can be compared in the same manner even though they may have different
key signatures Finally the study takes the types of chord changes and classifies
them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted
versions of the chord changes that make more sense to a human such as ldquochanges
involving dominant 7th chordsrdquo
In my preliminary implementation of this method on an electronic dance music
corpus I made a few modifications to Mauchrsquos study First I smoothed out time
frames before computing the most probable chords rather than smoothing the most
probable chords I did this to save time and to reduce volatility in the chord
measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference
which contains 935 time frames and lasts 212 seconds 5 time frames is slightly
under 1 second and for preliminary testing appeared to be a good interval for each
time block Second as mentioned in the literature section I did not abstract the
chord changes into H-topics This decision also stemmed from time constraints since
deriving semantic chord meaning from EDM songs would require careful research
into the types of harmonies and sounds common in that genre of music Below I
included a high-level visualization of the pitch metadata found in a sample song
ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change
vector that I could then feed into the Dirichlet Process algorithm
22
Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813
CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513
DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913
FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713
GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213
AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513
13 13 13 13
060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013
13 13 13 13 13 13 13 13 13 13 13 13
13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13
Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13
Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13
Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13
Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13
23
13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13
13
13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13
Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13
24
A final step I took to normalize the chord change data was to divide the numbers by
the length of the song so that each songrsquos number of chord changes was measured per
second
233 Timbre Preprocessing
For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre
uniformly across all songs [8] After collecting all song metadata I took a random
sample of 20 songs from each year starting at 1970 The reason I forced the sampling
to 20 randomly sampled songs from each year and did not take a random sample of
songs from all years at once was to prevent bias towards any type of sounds As seen
in figure 22 there are significantly more songs from 2000-2011 than before 2000 The
mean year is x = 2001052 the median year is 2003 and the standard deviation of the
years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include
a disproportionate amount of more recent songs In order to not miss out on sounds
that may be more prevalent in older songs I required a set number of songs from each
year Next from each randomly selected song I selected 20 random timbre frames
in order to prevent any biases in data collection within each song In total there
were 422020 = 16800 timbre frames collected Next I clustered the timbre frames
using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to
100 and selecting the number of clusters with the lowest Bayes Information Criterion
(BIC) a statistical measure commonly used to calculate the best fitting model The
BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters
and saved the mean values of each of the 12 timbre segments for each cluster formed
In the same way that every song had the same 192 cord changes whose frequencies
could be compared between songs each song now had the same 46 timbre clusters
but different frequencies in each song When reading in the metadata from each song
I calculated the most likely timbre cluster each timbre frame belonged to and kept
25
Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear
a frequency count of all of the possible timbre clusters observed in a song Finally
as with the pitch data I divided all observed counts by the duration of the song in
order to normalize each songrsquos timbre counts
26
Chapter 3
Results
31 Methodology
After the pitch and timbre data was processed I ran the Dirichlet Process on the
data For each song I concatenated the 192-element chord change frequency list
and the 46-element timbre category frequency list giving each song a total of 238
features However there is a problem with this setup The pitch data will inherently
dominate the clustering process since it contains almost 3 times as many features
as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to
give different weights to each feature I considered another possibility to remedy
this discrepancy duplicating the timbre vector a certain number of times and
concatenating that to the feature set of each song While this strategy runs the risk
of corrupting the feature set and turning it into something that does not accurately
represent each song it is important to keep in mind that even without duplicating
the timbre vector the feature set consists of two separate feature sets concatenated
to each other Therefore timbre duplication appears to be a reasonable strategy to
weigh pitch and timbre more evenly
27
After this modification I tweaked a few more parameters before obtaining my
final results Dividing the pitch and timbre frequencies by the duration of the song
normalized every song to frequency per second but it also had the undesired effect
of making the data too small Timbre and pitch frequencies per second were almost
always less than 10 and many times hovered as low as 0002 for nonzero values
Because all of the values were very close to each other using common values of α
in the range of 01 to 1000-2000 was insufficient to push the songs into different
clusters As a result every song fell into the same cluster Increasing the value
of α by several orders of magnitude to well over 10 million fixed the problem but
this solution presented two problems First tuning α to experiment with different
ways to cluster the music would be problematic since I would have to work with
an enormous range of possible values for α Second pushing α to such high values
is not appropriate for the Dirichlet Process Extremely high values of α indicate a
Dirichlet Process that will try to disperse the data into different clusters but a value
of α that high is in principle always assigning each new song to a new cluster On
the other hand varying α between 01 and 1000 for example presents a much wider
range of flexibility when assigning clusters While this may be possible by varying
the values of α an extreme amount with the data as it currently is we are using
the Dirichlet Process in a way it should mathematically not be used Therefore
multiplying all of the data by a constant value so that we can work in the appropriate
range of α is the ideal approach After some experimentation I found that k=10 was
an appropriate scaling factor After initial runs of the Dirichlet Process I found out
that there was a slight issue with some of the earlier songs Since I had only artist
genre tags not specific song tags for each song I chose songs based on whether any
of the tags associated with the artist fell under any electronic music genre including
the generic term rsquoelectronicrsquo There were some bands mostly older ones from the
1960s and 1970s like Electric Light Orchestra which had some electronic music but
28
mostly featured rock funk disco or another genre Given that these artists featured
mostly non-electronic songs I decided to exclude them from my study and generate
a blacklist indicating these music artists While it was infeasible to look through
every single song and determine whether it was electronic or not I was able to look
over the earliest songs in each cluster These songs were the most important to verify
as electronic because early non-electronic songs could end up forming new clusters
and inadvertently create clusters with non-electronic sounds that I was not looking for
The goal of this thesis is to identify different groups in which EM songs are
clustered and identify the most unique artists and genres While the second task is
very simple because it requires looking at the earliest songs in each cluster the first
is difficult to gauge the effectiveness of While I can look at the average chord change
and timbre category frequencies in each category as well as other metadata putting
more semantic interpretations to what the music actually sounds like and determining
whether the music is clustered properly is a very subjective process For this reason
I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and
compared the clustering in each category examining similarities and differences in
the clusters formed in each scenario in the Discussion section For each value α I
set the upper limit of components or clusters allowed to 50 The ranges of α I used
resulted in 9 14 and 19 clusters formed
32 Findings
321 α=005
When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are
the distribution of years of the songs in each cluster (note that the Dirichlet Process
29
does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are
skipped)
30
Figure 31 Song year distributions for α = 005
For each value of α I also calculated the average frequency of each chord change
category and timbre category for each cluster and plotted the results The green
lines correspond to timbre and the blue lines to pitch
31
Figure 32 Timbre and pitch distributions for α = 005
A table of each cluster formed the number of songs in that cluster and descriptions
of pitch timbre and rhythmic qualities characteristic of songs in that cluster are
shown below
32
Cluster Song Count Characteristic Sounds
0 6481 Minimalist industrial space sounds dissonant chords
1 5482 Soft New Age ethereal
2 2405 Defined sounds electronic and non-electronic instru-
ments played in standard rock rhythms
3 360 Very dense and complex synths slightly darker tone
4 4550 Heavily distorted rock and synthesizer
6 2854 Faster paced 80s synth rock acid house
8 798 Aggressive beats dense house music
9 1464 Ambient house trancelike strong beats mysterious
tone
11 1597 Melancholy tones New wave rock in 80s then starting
in 90s downtempo trip-hop nu-metal
Table 31 Song cluster descriptions for α = 005
322 α=01
A total of 14 clusters were formed (16 were formed but 2 clusters contained only
one song each I listened to both of these songs and they did not sound unique so
I discarded them from the clusters) Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
33
34
Figure 33 Song year distributions for α = 01
35
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
Chapter 1
Introduction
11 Background Information
Electronic Music (EM) is an increasingly popular genre of music with an immense
presence and influence on modern culture Because the genre is new as a whole and
is arguably more loosely structured than other genres - technology has enabled the
creation of a wide range of sounds and easy blending of existing and new sounds alike
- formal analysis especially mathematical analysis on the genre is fairly limited and
has only begun growing in the past few years As a fan of EM I am interested in
exploring how the genre has evolved over time More specifically my goal with this
project was to design some structure or model that could help me identify which EM
artists have contributed the most stylistically to the genre Oftentimes famous EM
artists do not create novel-sounding music but rather popularize an existing style
and the motivation of this study is to understand who has stylistically contributed
the most to the EM scene versus those who have merely popularized aspects of it
As the study progressed the manner in which I constructed my model lent to
a second goal of the thesis imagining new ways in which we can imagine EM genres
1
While there exists an extensive amount of research analyzing music trends from
a non-mathematical (cultural societal artistic) perspective the analysis of EM
from a mathematical perspective and especially with respect to any computationally
measurable trends in the genre is close to nonexistent EM has been analyzed to a
lesser extent than other common genres of music in the academic world most likely
due to existing for a shorter amount of time and being less rooted in prominant
social and cultural events In fact the first published reference work on EM did not
exist until 2012 when Professor Mark J Butler from Northwestern University edited
and published Electronica Dance and Club Music a collection of essays exploring
EM genres and culture [1] Furthermore there are very few comprehensive visual
guides that allow a user to relate every genre to each other and easily observe how
different genres converge and diverge While conducting research the best guide I
found was not a scholarly source but an online guide created by an EM enthusiast
Ishkurrsquos Guide to Electronic Music [2] This guide which includes over 100 specific
genres grouped by more general genres and represents chronological evolutions by
connecting each genre in a flowchart is the most exhaustive analysis of the EM scene
I could find However the guidersquos analysis is very qualitative While each subgenre
contains an explanation on typical rhythm and sounds and includes well-known
songs indicative of the style the guide is created by someone who used historical and
personal knowledge of EM My model which creates music genres by chronologically
ordering songs and then assigning them to clusters is a different approach towards
imagining the entire landscape of EM The results may confirm Ishkurrsquos Guidersquos
findings in which case his guide is given additional merit with mathematical evi-
dence or it may be different suggesting that there may be better ways to group EM
genres One advantage that guides such as Ishkurrsquos and historically-based scholarly
works have over my approach is that those models are history-sensitive and therefore
may group songs in a way that historically makes sense On the other hand my
2
model is history-agnostic and may not realize the historical context of songs when
clustering However I believe that there is still significant merit to my research
Instead of classifying genres of music by early genres that led to them my approach
gives the most credit to the artists and songs that were the most innovative for their
time and perhaps reveal different musical styles that are more similar to each other
than history would otherwise imply This way of thinking of music genres while
unconventional is another way of imagining EM
The practice of quantitatively analyzing music has exploded in the last decade
thanks to technological and algorithmic advances that allow data scientists to con-
structively sift through troves of music and listener information In the literature
review I will focus on two particular organizations that have contributed greatly to
the large-scale mathematical analysis of music Pandora a website that plays songs
similar to a songartistalbum inputted by a user and Echo Nest a music analytics
firm that was acquired by Spotify in 2014 and drives Spotifyrsquos Discover Weekly
feature [3] After evaluating the relevance of these sources to my thesis work I will
then look over the relevant academic research and evaluate what this research can
contribute
12 Literature Review
The analysis of quantitative music generally falls into two categories research con-
ducted by academics and academic organizations for scholarly purposes and research
conducted by companies and primarily targeted for consumers First looking at the
consumer-based research Spotify and Pandora are two of the most prominent based
groups and the two I decided to focus on Spotify is a music streaming service where
users can listen to albums and songs from a wide variety of artists or listen to weekly
3
playlists generated based on the music the user and userrsquos friends have listened to
The weekly playlist called Discover Weekly Playlist is a relatively new feature in
Spotify and is driven by music analysis algorithms created from Echo Nest Using
the Echo Nest code interface Spotify creates a ldquotaste profilerdquo for each user which
assesses attributes such as how often a user branches out to new styles of music how
closely the userrsquos music streamed follows popular Billboard music charts and so on
Spotify also looks at the artists and songs the user streamed and creates clusters
of different genres that the user likes (see figure 11) The taste profile and music
clusters can then be used to generate playlists geared to a specific user The genres
in the cluster come from a list of nearly 800 names which are derived by scraping
the Internet for trending terms in music as well as training various algorithms on a
regular basic by ldquolisteningrdquo to new songs [4][5]
Figure 11 A userrsquos taste profile generated by Spotify
4
Although Spotify and Echo Nestrsquos algorithms are very useful for mapping the land-
scape of established and emerging genres of music the methodology is limited to
pre-defined genres of music This may serve as a good starting point to compare my
final results to but my study aims to be as context-free as possible by attaching no
preconceived notions of music styles or genres instead looking at features that could
be measured in every song
While Spotifyrsquos approach to mapping music is very high-tech and based on ex-
isting genres Pandora takes a very low-tech and context-free approach to music
clustering Pandora created the Music Genome Project a multi-year undertaking
where skilled music theorists listened to a large number of songs and analyzed up to
450 characteristics in each song [6] Pandorarsquos approach is appealing to the aim of
my study since it does not take any preconceived notions of what a genre of music
is instead comparing songs on common characteristics such as pitch rhythm and
instrument patterns Unfortunately I do not have a cadre of skilled music theorists
at my disposal nor do I have 10 years to perform such calculations like the dedicated
workers at Pandora (tips the indestructible fedora) Additionally Pandorarsquos Music
Genome Project is intellectual property so at best I can only rely on the abstract
concepts of the Music Genome Project to drive my study
In the academic realm there are no existing studies analyzing quantifiable changes in
EM specifically but there exist a few studies that perform such analysis on popular
Western music in general One such study is Measuring the Evolution of Contem-
porary Western Popular Music which analyzes music from 1955-2010 spanning all
common genres Using the Million Song Dataset a free public database of songs
each containing metadata (see section 13) the study focuses on the attributes pitch
timbre and loudness Pitch is defined as the standard musical notes or frequency of
5
the sound waves Timbre is formally defined as the Mel frequency cepstral coefficients
(MFCC) of a transformed sound signal More informally it refers to the sound color
texture or tone quality and is associated with instrument types recording resources
and production techniques In other words two sounds that have the same pitch
but different tones (for example a bell and voice) are differentiated by their timbres
There are 12 MFCCs that define the timbre of a given sound Finally loudness
refers to intrinsically how loud the music sounds not loudness that a listener can
manipulate while listening to the music Loudness is the first MFCC of the timbre
of a sound [7] The study concluded that over time music has been becoming louder
and less diverse
The restriction of pitch sequences (with metrics showing less variety inpitch progressions) the homogenization of the timbral palette (with fre-quent timbres becoming more frequent) and growing average loudnesslevels (threatening a dynamic richness that has been conserved until to-day) This suggests that our perception of the new would be essentiallyrooted on identifying simpler pitch sequences fashionable timbral mix-tures and louder volumes Hence an old tune with slightly simpler chordprogressions new instrument sonorities that were in agreement with cur-rent tendencies and recorded with modern techniques that allowed forincreased loudness levels could be easily perceived as novel fashionableand groundbreaking
This study serves as a good starting point for mathematically analyzing music in
a few ways First it utilizes the Million Song Dataset which addresses the issue
of legally obtaining music metadata As mentioned in section 13 the only legal
way to obtain playable music for this study would have been to purchase all songs I
would include which is infeasible While the Million Song Dataset does not contain
the audio files in playable format it does contain audio features and metadata that
allow for in-depth analysis In addition working with the dataset takes out the
work of extracting features from raw audio files saving an extensive amount of time
and energy Second the study establishes specifics for what constitutes a trend
in music Pitch timbre and loudness are core features of music and examining the6
distributions of each among songs over time reveals a lot of information about how
the music industry and consumersrsquo tastes have evolved While these are not all of the
features contained in a song they serve as a good starting point Third the study
defines mathematical ways to capture music attributes and measure their change
over time For example pitches are transposed into the same tonal context with
binary discretized pitch descriptions based on a threshold so that each song can be
represented with vectors of pitches that are normalized and compared to other songs
While this study lays some solid groundwork for capturing and analyzing nu-
meric qualities of music it falls short of addressing my goals in a couple of ways
First it does not perform any analysis with respect to music genre While the
analysis performed in this paper could easily be applied to a list of songs in a specific
genre certain genres might have unique sounds and rhythms relative to other genres
that would be worth studying in greater detail Second the study only measures
general trends in music over time The models used to describe changes are simple
regressions that donrsquot look at more nuanced changes For example what styles of
music developed over certain periods of time How rapid were those changes Which
styles of music developed from which other styles
A more promising study led by music researcher Matthias Mauch [8] analyzes
contemporary popular Western Music from the 1960s to 2010s by comparing numer-
ical data on the pitches and timbre of a corpus of 17000 songs that appeared on the
Billboard Hot 100 Like the previously mentioned paper Measuring the Evolution
of Contemporary Western Popular Music Mauchrsquos study also creates abstractions
of pitch and timbre in order to provide a consistent and meaningful semantic inter-
pretation of musical data (see figure 12) However Mauchrsquos study takes this idea a
step further by using genre tags from Lastfm a music website and constructing a
7
hierarchy of music genres using hierarchical clustering Additionally the study takes
a crack at determining whether a particular band the Beatles was musically ground-
breaking for its time or merely playing off sounds that other bands had already used
Figure 12 Data processing pipeline for Mauchrsquos study illustrated with a segment ofQueenrsquos Bohemian Rhapsody 1975
While both Measuring the Evolution of Contemporary Western Popular Music
and Mauchrsquos study created abstractions of pitch and timbre Mauchrsquos study is more
appealing with respect to my goal because its end results align more closely with
mine Additionally the data processing pipeline offers several layers of abstraction
8
and depending on my progress I would be able to achieve at least one of the levels of
abstraction As shown in figure 12 each segment of a raw audio file is first broken
down into its 12 timbre MFCCs and pitch components Next the study constructs
ldquolexiconsrdquo or a dictionary of pitch and timbre terms that all songs can be compared
to For pitch the original data is in a N-by-12 matrix where N is the number of time
segments in the song and 12 the number of each of the notes found in an octave of
pitches Each time segment contains the relative strengths of each of the 12 pitches
However music sounds are not merely a collection of pitches but more precisely
chords Furthermore the similarity of two songs is not determined by the absolute
pitches of their chords but rather the progression of chords in the song all relative to
each other For example if all the notes in a song are transposed by one step the song
will sound different in terms of absolute pitch but the song will still be recognized
as the original because all of the relative movements from each chord to the next
are the same This phenomenon is captured in the pitch data by finding the most
likely chord played at each time segment then counting the change to the next chord
at each time step and generating a table of chord change frequencies for each song
Constructing the timbre lexcion is more complicated since there is no easy analogue
like chords for pitches to compare songs Mauchrsquos study utilizes a Gaussian Mixture
Model (GMM) by iterating over k=1 to k=N clusters where N is a large number
running the GMM on each prior assumption of k clusters and computing the Bayes
Information Criterion (BIC) for each model The lowest of the N BIC values is found
and that value of k is selected That model contains k different timbre clusters
and each cluster contains the mean timbre value for each of the 12 timbre components
For my research I decided that the pitch and timbre lexicons would be the most
realistic level of abstraction I could obtain Mauchrsquos study adds an addtional layer
to pitch and timbre by identifying the most common patterns of chord changes and
9
most common timbre rhythms and creating more general tags from these combined
terms such as ldquo stepwise changes indicating modal harmonyrdquo for a pitch topic and
ldquooh rounded mellowrdquo for a timbral topic There were two problems with using this
final layer of abstraction for my study First attaching semantic interpretations to
the pitch and timbral lexicons is a difficult task For timbre I would need to listen
to sound samples containing all of the different timbral categories I identified and
attaching user interpretations to them For the chords not only would I have to
perform the same analysis as on timbre but take careful attention to identify which
chords correspond to common sound progressions in popular music a task that I am
not qualified for an did not have the resources for this thesis to seek out Second
this final layer of abstraction was not necessary for the end goal of my paper In
fact consolidating my pitch and timbre lexicons into simpler phrases would run the
risk of pigeonholing my analysis and preventing me from discovering more nuanced
patterns in my final results Therefore I decided to focus on pitch and timbral
lexicon construction as the furthest levels of abstraction when processing songs for
my thesis Mathematical details on how I constructed the lexical and timbral lexicons
can be found in the Mathematical Modeling section of this paper
13 The Dataset
In order to successfully execute my thesis I need access to an extensive database of
music Until recently acquiring a substantial corpus of music data was a difficult and
costly task It is illegal to download music audio files from video and music-sharing
sites such as YouTube Spotify and Pandora Some platforms such as iTunes offer
90-second previews of songs but using only segments of songs and usually segments
that showcase the chorus of the song are not reliable measures to capture the entire
essence of a song Even if I were to legally download entire audio files for free I would
10
run into additional issues Obtaining a high-quality corpora of song data would be
challenging writing scripts that crawl music sharing platforms may not capture all of
the music I am looking for And once I have the audio files I would have to perform
audio processing techniques to extract the relevant information from the songs
Fortunately there is an easy solution to the music data acquisition problem
The Million Song Dataset (MSD) is a collection of metadata for one million music
tracks dating up to 2011 Various organizations such as The Echo Nest Musicbrainz
7digital and Lastfm have contributed different pieces of metadata Each song is
represented as a Hierarchical Data Format file (HDF5) which can be loaded as a
JSON object The fields encompass topical features such as the song title artist
and release date as well as lower-level features such as the loudness starting beat
time pitches and timbre of several segments of the song [9] While the MSD is
the largest free and open source music metadata dataset I could find there is no
guarantee that it adequately covers the entire spectrum of EM artists and songs
This quality limitation is important to consider throughout the study A quick look
through the songs including the subset of data I worked with for this report showed
that there were several well-known artists and songs in the EM scene Therefore
while the MSD may not contain all desired songs for this project it contains an
adequate number of relevant songs to produce some meaningful results Additionally
laying the groundwork for modeling the similarities between songs and identifying
groundbreaking ones is the same regardless of the songs included and the following
methodologies can be implemented on any similarly-formatted dataset including one
with songs that might currently be missing in the MSD
11
Chapter 2
Mathematical Modeling
21 Determining Novelty of Songs
Finding an logical and implementable mathematical model was and continues to be
an important aspect of my research My problem how to mathematically determine
which songs were unique for their time requires an algorithm in which each song is
introduced in chronological order either joining an existing category or starting a
new category based on its musical similarity to songs already introduced Clustering
algorithms like k-means or Gaussian Mixture Models (GMM) which have a prede-
termined number of clusters and optimize the partitioning of a dataset into those
cluster assume a fixed number of clusters While this process would work if we knew
exactly how many genres of EM existed if we guess wrong our end results may end
up with clusters that are wrongly grouped together or separated It is much better to
apply a clustering algorithm that does not make any assumptions about this number
One particularly promising process that addresses the issue of tthe number of
clusters is a family of algorithms known as Dirichlet Proccesses (DPs) DPs are
useful for this particular application because (1) they assign clusters to a dataset
12
with only an upper bound on the number of clusters and (2) by sorting the songs
in chronological order before running the algorithm and keeping track of which
songs are categorized under each cluster we can observe the earliest songs in each
cluster and consequentially infer which songs were responsible for creating new
clusters The arguments for the DP The DP is controlled by a parameter α which
is the concentration parameter The expected number of clusters formed is directly
proportional to the value of α so the higher the value of α the more likely new
clusters will be formed [10] Regardless of the value of α as the number of data
points introduced increases the probability of a new group being formed decreases
That is a ldquorich get richerrdquo policy is in place and existing clusters tend to grow in
size Tweaking the value of the tunable parameter α is an important part of the
study since it determines the flexibility given to forming a new cluster If the value
of α is too small then the criteria for forming clusters will be too strict and data
that should be in different clusters will be assigned to the same cluster On the other
hand if α is too large the algorithm will be too sensitive and assign similar songs to
different clusters
The implementation of the DP was achieved using scikit-learnrsquos library and API for
Dirichlet Process Gaussian Mixture Model (DPGMM) The DPGMM is the formal
name of the Dirichlet Process model used to cluster the data More specifically
scikit-learnrsquos implementation of the DPGMM uses the Stick Breaking method
one of several equally valid methods to assign songs to clusters [11] While the
mathematical details for this algorithm can be found at the following citation [12]
the most important aspects of the DPGMM are the arguments that the user can
specify and tune The first of these tunable parameters is the value α which is the
same parameter as the α discussed in the previous paragraph As seen in Figure 21
on the right side properly tuning α is key to obtaining meaningful clusters The
13
center image has α set to 001 which is too small and results in all of the data being
formed under one cluster On the other hand the bottom-right image has the same
data set and α set to 100 which does a better job of clustering On a related note
the figure also demonstrates the effectiveness of the DPGMM over the GMM On the
left side clearly the dataset contains 2 clusters but the GMM on the top-left image
assumes 5 clusters as a prior and consequentially clusters the data incorrectly while
the DPGMM manages to limit the data to 2 clusters
The second argument that the user inputs for the DPGMM is the data that
will be clustered The scikit-learn implementation takes the data in the format
of a nested list (N lists each of length m) where N is the number of data points
and m the number of features While the format of the data structure is relatively
straightforward choosing which numbers should be in the data was a challenge I
faced Selecting the relevant features of each song to be used in the algorithm will
be expounded upon in the next section ldquoFeature Selectionrdquo
The last argument that a user inputs for the scikit-learn DGPMM implementa-
tion is an argument indicating the upper bound for the number of clusters The
Dirichlet Process then determines the best number of clusters for the data between
1 and the upper bound Since the DPGMM is flexible enough to find the best value
I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to
modify the number of clusters formed
22 Feature Selection
One of the most difficult aspects of the Dirichlet method is choosing the features
to be used for clustering In other words when we organize the songs into clusters
14
Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α
we need to ensure that each cluster is distinct in a way that is statistically and
intuitively logical In the Million Song Dataset [9] each song is represented as a
JSON object containing several fields These fields are candidate features to be used
in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo
by Rick Astley and the corresponding features
artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID
for this artists is db9)
artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)
artist_mbtags_count shape = (4)
(raw tag count of the 4 tags this artist received on musicbrainzorg)
artist_name Rick Astley (artist name)
artist_playmeid 1338 (the ID of that artist on the service playmecom)
artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)
artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest
(number between 0 and 1))
artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest
(number between 0 and 1))
15
audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for
the analysis by The Echo Nest)
bars_confidence shape = (99) (confidence value (between 0 and 1) associated
with each bar by The Echo Nest)
bars_start shape = (99) (start time of each bar according to The Echo Nest this
song has 99 bars)
beats_confidence shape = (397))
confidence value (between 0 and 1) associated with each beat by The Echo Nest
beats_start shape = (397) (start time of each beat according to The Echo Nest
this song has 397 beats)
danceability 00 (danceability measure of this song according to The Echo Nest
(between 0 and 1 0 =gt not analyzed))
duration 21169587 (duration of the track in seconds)
end_of_fade_in 0139 (time of the end of the fade in at the beginning of the
song according to The Echo Nest)
energy 00 (energy measure (not in the signal processing sense) according to The
Echo Nest (between 0 and 1 0 = not analyzed))
key 1 (estimation of the key the song is in by The Echo Nest)
key_confidence 0324 (confidence of the key estimation)
loudness -775 (general loudness of the track)
mode 1 (estimation of the mode the song is in by The Echo Nest)
mode_confidence 0434 (confidence of the mode estimation)
release Big Tunes - Back 2 The 80s (album name from which the track was taken
some songs tracks can come from many albums we give only one)
release_7digitalid 786795 (the ID of the release (album) on the service 7digi-
talcom)
sections_confidence shape = (10) (confidence value (between 0 and 1) associated
16
with each section by The Echo Nest)
sections_start shape = (10) (start time of each section according to The Echo
Nest this song has 10 sections)
segments_confidence shape = (935) (confidence value (between 0 and 1) asso-
ciated with each segment by The Echo Nest)
segments_loudness_max shape = (935) (max loudness during each segment)
segments_loudness_max_time shape = (935) (time of the max loudness
during each segment)
segments_loudness_start shape = (935) (loudness at the beginning of each
segment)
segments_pitches shape = (935 12) (chroma features for each segment (normal-
ized so max is 1))
segments_start shape = (935) (start time of each segment ( musical event or
onset) according to The Echo Nest this song has 935 segments)
segments_timbre shape = (935 12) (MFCC-like features for each segment)
similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar
to Rick Astley according to The Echo Nest)
song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded
(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))
song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can
be associated with many tracks (with very slight audio differences))
start_of _fade _out 198536 (start time of the fade out in seconds at the end
of the song according to The Echo Nest)
tatums_confidence shape = (794) (confidence value (between 0 and 1) associated
with each tatum by The Echo Nest)
tatums_start shape = (794) (start time of each tatum according to The Echo
Nest this song has 794 tatums)
17
tempo 113359 (tempo in BPM according to The Echo Nest)
time_signature 4 (time signature of the song according to The Echo Nest ie
usual number of beats per bar)
time_signature_confidence 0634 (confidence of the time signature estimation)
title Never Gonna Give You Up (song title)
track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)
track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track
on which the analysis was done) year 1987 (year when this song was released
according to musicbrainzorg)
When choosing features my main goal was to use features that would most
likely yield meaningful results yet also be simple and make sense to the average
person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener
will have his or her opinions to what constitutes different types of music but some
common features most people tend to differentiate songs by are pitch rhythm and
the types of instruments used The following specific fields provided in each song
object fall under these three terms
Pitch
bull segments_pitches a matrix of values indicating the strength of each pitch (or
note) at each discernible time interval
Rhythm
bull beats_start a vector of values indicating the start time of each beat
bull time_signature the time signature of the song
bull tempo the speed of the song in Beats Per Minute (BPM)
Instruments
18
bull segments_timbre a matrix of values indicating the distribution of MFCC-like
features (different types of tones) for each segments
The segments_pitches feature is a clear candidate for a differentiating factor for
songs since it reveals patterns of notes that occur Additionally other research
papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a
procedure that allows all songs to be compared with the same metric Likewise
timbre is intuitively a reliable differentiating feature since it reveals the amount
that different tones or sounds that sound different despite having the same pitch
Therefore segments_timbre is another feature that is considered in each song
Finally we look at the candidate features for rhythm At first glance all of these
features appear to be useful as they indicate the rhythm of a song in one way or
another However none of these features are as useful as the pitch and timbre
features While tempo is one factor in differentiating genres of EDM and music in
general tempo alone is not a driving force of musical innovation Certain genres
of EDM like drum nrsquo bass and happycore stand out for having very fast tempos
but the tempo is supplemented with a sound unique to the genre Conceiving new
arrangements of pitches combining instruments in new ways and inventing new
types of sounds are novel but speeding up or slowing down existing sounds is not
Including tempo as a feature could actually add noise to the model since many genres
overlap in their tempos And finally tempo is measured indirectly when the pitch
and timbre features are normalized for each song everything is measured in units of
ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features
each second Time signature can be dismissed from the candidates features for the
same reason as tempo many genres contain the same time signature and including
it in the feature set would only add more noise beats_start looks like a more
promising feature since like segments_pitches and segments_timbre it consists of
a vector of values However difficulties arise when we begin to think how exactly
19
we can utilize this information Since each song varies in length we need a way to
compare songs of different durations on the same level One approach could be to
perform basic statistics on the distance between each beat for example calculating
the mean and standard deviation of this distance However the normalized pitch
and timbre information already capture this data Another possibility is detecting
certain patterns of beats which could differentiate the syncopated dubstep or glitch
music beats from the steady pulse of electro-house But once again every beat is
accompanied by a sound with a specific timbre and pitch so this feature would not
add any significantly new information
23 Collecting Data and Preprocessing Selected Fea-
tures
231 Collecting the Data
Upon deciding the features I wanted to use in my research I first needed to collect
all of the electronic songs in the Million Song Dataset The easiest reliable way to
achieve this was to iterate through each song in the database and save the information
for the songs where any of the artist genre tags in artist_mbtags matched with an
electronic music genre While this measure was not fully accurate because it looks at
the genre of the artist not the song specific genre information for each song was not
as easily accessible so this indicator was nearly as good a substitute To generate a
list of the genres that electronic songs would fall under I manually searched through
a subset of the MSD to find all genres that seemed to be releated to electronic music
In the case of genres that were sometimes but not always electronic in nature such
20
as disco or pop I erred on the side of caution and did not include them in the list
of electronic genres In these cases false positives such as primarily rock songs that
happen to have the disco label attached to the artist could inadvertantly be included
in the dataset The final list of genres is as follows
target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo
rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo
rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo
rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
rsquodance and electronicarsquorsquoelectronicrsquo]
232 Pitch Preprocessing
A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-
cally informed manner The study first takes the raw sound data and converts it into
a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest
amount Then it computes the most likely chord by comparing comparing the 4 most
common types of chords in popular music (major minor dominant 7 and minor 7) to
the observed chord The most common chords are represented as ldquotemplate chordsrdquo
and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For
example using the note C as the first index the C major chord is represented as
CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)
For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is
computed over every template chord
ρCTc =12sumi=1
(CT minus CTi)(ci minus c)σCTσc
21
where CT is the mean of the values in the template chord σCT is the standard
deviation of the values in the chord and the operations on c are analogous Note
that the summation is over each individual pitch in the 12 pitch classes The chord
template with the highest value of ρ is selected as the chord for the time frame
After this is performed for each time frame the values are smoothed and then the
change between adjacent chords is observed The reasoning behind this step is that
by measuring the relative distance between chords rather than the chords themselves
all songs can be compared in the same manner even though they may have different
key signatures Finally the study takes the types of chord changes and classifies
them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted
versions of the chord changes that make more sense to a human such as ldquochanges
involving dominant 7th chordsrdquo
In my preliminary implementation of this method on an electronic dance music
corpus I made a few modifications to Mauchrsquos study First I smoothed out time
frames before computing the most probable chords rather than smoothing the most
probable chords I did this to save time and to reduce volatility in the chord
measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference
which contains 935 time frames and lasts 212 seconds 5 time frames is slightly
under 1 second and for preliminary testing appeared to be a good interval for each
time block Second as mentioned in the literature section I did not abstract the
chord changes into H-topics This decision also stemmed from time constraints since
deriving semantic chord meaning from EDM songs would require careful research
into the types of harmonies and sounds common in that genre of music Below I
included a high-level visualization of the pitch metadata found in a sample song
ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change
vector that I could then feed into the Dirichlet Process algorithm
22
Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813
CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513
DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913
FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713
GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213
AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513
13 13 13 13
060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013
13 13 13 13 13 13 13 13 13 13 13 13
13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13
Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13
Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13
Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13
Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13
23
13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13
13
13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13
Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13
24
A final step I took to normalize the chord change data was to divide the numbers by
the length of the song so that each songrsquos number of chord changes was measured per
second
233 Timbre Preprocessing
For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre
uniformly across all songs [8] After collecting all song metadata I took a random
sample of 20 songs from each year starting at 1970 The reason I forced the sampling
to 20 randomly sampled songs from each year and did not take a random sample of
songs from all years at once was to prevent bias towards any type of sounds As seen
in figure 22 there are significantly more songs from 2000-2011 than before 2000 The
mean year is x = 2001052 the median year is 2003 and the standard deviation of the
years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include
a disproportionate amount of more recent songs In order to not miss out on sounds
that may be more prevalent in older songs I required a set number of songs from each
year Next from each randomly selected song I selected 20 random timbre frames
in order to prevent any biases in data collection within each song In total there
were 422020 = 16800 timbre frames collected Next I clustered the timbre frames
using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to
100 and selecting the number of clusters with the lowest Bayes Information Criterion
(BIC) a statistical measure commonly used to calculate the best fitting model The
BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters
and saved the mean values of each of the 12 timbre segments for each cluster formed
In the same way that every song had the same 192 cord changes whose frequencies
could be compared between songs each song now had the same 46 timbre clusters
but different frequencies in each song When reading in the metadata from each song
I calculated the most likely timbre cluster each timbre frame belonged to and kept
25
Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear
a frequency count of all of the possible timbre clusters observed in a song Finally
as with the pitch data I divided all observed counts by the duration of the song in
order to normalize each songrsquos timbre counts
26
Chapter 3
Results
31 Methodology
After the pitch and timbre data was processed I ran the Dirichlet Process on the
data For each song I concatenated the 192-element chord change frequency list
and the 46-element timbre category frequency list giving each song a total of 238
features However there is a problem with this setup The pitch data will inherently
dominate the clustering process since it contains almost 3 times as many features
as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to
give different weights to each feature I considered another possibility to remedy
this discrepancy duplicating the timbre vector a certain number of times and
concatenating that to the feature set of each song While this strategy runs the risk
of corrupting the feature set and turning it into something that does not accurately
represent each song it is important to keep in mind that even without duplicating
the timbre vector the feature set consists of two separate feature sets concatenated
to each other Therefore timbre duplication appears to be a reasonable strategy to
weigh pitch and timbre more evenly
27
After this modification I tweaked a few more parameters before obtaining my
final results Dividing the pitch and timbre frequencies by the duration of the song
normalized every song to frequency per second but it also had the undesired effect
of making the data too small Timbre and pitch frequencies per second were almost
always less than 10 and many times hovered as low as 0002 for nonzero values
Because all of the values were very close to each other using common values of α
in the range of 01 to 1000-2000 was insufficient to push the songs into different
clusters As a result every song fell into the same cluster Increasing the value
of α by several orders of magnitude to well over 10 million fixed the problem but
this solution presented two problems First tuning α to experiment with different
ways to cluster the music would be problematic since I would have to work with
an enormous range of possible values for α Second pushing α to such high values
is not appropriate for the Dirichlet Process Extremely high values of α indicate a
Dirichlet Process that will try to disperse the data into different clusters but a value
of α that high is in principle always assigning each new song to a new cluster On
the other hand varying α between 01 and 1000 for example presents a much wider
range of flexibility when assigning clusters While this may be possible by varying
the values of α an extreme amount with the data as it currently is we are using
the Dirichlet Process in a way it should mathematically not be used Therefore
multiplying all of the data by a constant value so that we can work in the appropriate
range of α is the ideal approach After some experimentation I found that k=10 was
an appropriate scaling factor After initial runs of the Dirichlet Process I found out
that there was a slight issue with some of the earlier songs Since I had only artist
genre tags not specific song tags for each song I chose songs based on whether any
of the tags associated with the artist fell under any electronic music genre including
the generic term rsquoelectronicrsquo There were some bands mostly older ones from the
1960s and 1970s like Electric Light Orchestra which had some electronic music but
28
mostly featured rock funk disco or another genre Given that these artists featured
mostly non-electronic songs I decided to exclude them from my study and generate
a blacklist indicating these music artists While it was infeasible to look through
every single song and determine whether it was electronic or not I was able to look
over the earliest songs in each cluster These songs were the most important to verify
as electronic because early non-electronic songs could end up forming new clusters
and inadvertently create clusters with non-electronic sounds that I was not looking for
The goal of this thesis is to identify different groups in which EM songs are
clustered and identify the most unique artists and genres While the second task is
very simple because it requires looking at the earliest songs in each cluster the first
is difficult to gauge the effectiveness of While I can look at the average chord change
and timbre category frequencies in each category as well as other metadata putting
more semantic interpretations to what the music actually sounds like and determining
whether the music is clustered properly is a very subjective process For this reason
I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and
compared the clustering in each category examining similarities and differences in
the clusters formed in each scenario in the Discussion section For each value α I
set the upper limit of components or clusters allowed to 50 The ranges of α I used
resulted in 9 14 and 19 clusters formed
32 Findings
321 α=005
When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are
the distribution of years of the songs in each cluster (note that the Dirichlet Process
29
does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are
skipped)
30
Figure 31 Song year distributions for α = 005
For each value of α I also calculated the average frequency of each chord change
category and timbre category for each cluster and plotted the results The green
lines correspond to timbre and the blue lines to pitch
31
Figure 32 Timbre and pitch distributions for α = 005
A table of each cluster formed the number of songs in that cluster and descriptions
of pitch timbre and rhythmic qualities characteristic of songs in that cluster are
shown below
32
Cluster Song Count Characteristic Sounds
0 6481 Minimalist industrial space sounds dissonant chords
1 5482 Soft New Age ethereal
2 2405 Defined sounds electronic and non-electronic instru-
ments played in standard rock rhythms
3 360 Very dense and complex synths slightly darker tone
4 4550 Heavily distorted rock and synthesizer
6 2854 Faster paced 80s synth rock acid house
8 798 Aggressive beats dense house music
9 1464 Ambient house trancelike strong beats mysterious
tone
11 1597 Melancholy tones New wave rock in 80s then starting
in 90s downtempo trip-hop nu-metal
Table 31 Song cluster descriptions for α = 005
322 α=01
A total of 14 clusters were formed (16 were formed but 2 clusters contained only
one song each I listened to both of these songs and they did not sound unique so
I discarded them from the clusters) Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
33
34
Figure 33 Song year distributions for α = 01
35
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
While there exists an extensive amount of research analyzing music trends from
a non-mathematical (cultural societal artistic) perspective the analysis of EM
from a mathematical perspective and especially with respect to any computationally
measurable trends in the genre is close to nonexistent EM has been analyzed to a
lesser extent than other common genres of music in the academic world most likely
due to existing for a shorter amount of time and being less rooted in prominant
social and cultural events In fact the first published reference work on EM did not
exist until 2012 when Professor Mark J Butler from Northwestern University edited
and published Electronica Dance and Club Music a collection of essays exploring
EM genres and culture [1] Furthermore there are very few comprehensive visual
guides that allow a user to relate every genre to each other and easily observe how
different genres converge and diverge While conducting research the best guide I
found was not a scholarly source but an online guide created by an EM enthusiast
Ishkurrsquos Guide to Electronic Music [2] This guide which includes over 100 specific
genres grouped by more general genres and represents chronological evolutions by
connecting each genre in a flowchart is the most exhaustive analysis of the EM scene
I could find However the guidersquos analysis is very qualitative While each subgenre
contains an explanation on typical rhythm and sounds and includes well-known
songs indicative of the style the guide is created by someone who used historical and
personal knowledge of EM My model which creates music genres by chronologically
ordering songs and then assigning them to clusters is a different approach towards
imagining the entire landscape of EM The results may confirm Ishkurrsquos Guidersquos
findings in which case his guide is given additional merit with mathematical evi-
dence or it may be different suggesting that there may be better ways to group EM
genres One advantage that guides such as Ishkurrsquos and historically-based scholarly
works have over my approach is that those models are history-sensitive and therefore
may group songs in a way that historically makes sense On the other hand my
2
model is history-agnostic and may not realize the historical context of songs when
clustering However I believe that there is still significant merit to my research
Instead of classifying genres of music by early genres that led to them my approach
gives the most credit to the artists and songs that were the most innovative for their
time and perhaps reveal different musical styles that are more similar to each other
than history would otherwise imply This way of thinking of music genres while
unconventional is another way of imagining EM
The practice of quantitatively analyzing music has exploded in the last decade
thanks to technological and algorithmic advances that allow data scientists to con-
structively sift through troves of music and listener information In the literature
review I will focus on two particular organizations that have contributed greatly to
the large-scale mathematical analysis of music Pandora a website that plays songs
similar to a songartistalbum inputted by a user and Echo Nest a music analytics
firm that was acquired by Spotify in 2014 and drives Spotifyrsquos Discover Weekly
feature [3] After evaluating the relevance of these sources to my thesis work I will
then look over the relevant academic research and evaluate what this research can
contribute
12 Literature Review
The analysis of quantitative music generally falls into two categories research con-
ducted by academics and academic organizations for scholarly purposes and research
conducted by companies and primarily targeted for consumers First looking at the
consumer-based research Spotify and Pandora are two of the most prominent based
groups and the two I decided to focus on Spotify is a music streaming service where
users can listen to albums and songs from a wide variety of artists or listen to weekly
3
playlists generated based on the music the user and userrsquos friends have listened to
The weekly playlist called Discover Weekly Playlist is a relatively new feature in
Spotify and is driven by music analysis algorithms created from Echo Nest Using
the Echo Nest code interface Spotify creates a ldquotaste profilerdquo for each user which
assesses attributes such as how often a user branches out to new styles of music how
closely the userrsquos music streamed follows popular Billboard music charts and so on
Spotify also looks at the artists and songs the user streamed and creates clusters
of different genres that the user likes (see figure 11) The taste profile and music
clusters can then be used to generate playlists geared to a specific user The genres
in the cluster come from a list of nearly 800 names which are derived by scraping
the Internet for trending terms in music as well as training various algorithms on a
regular basic by ldquolisteningrdquo to new songs [4][5]
Figure 11 A userrsquos taste profile generated by Spotify
4
Although Spotify and Echo Nestrsquos algorithms are very useful for mapping the land-
scape of established and emerging genres of music the methodology is limited to
pre-defined genres of music This may serve as a good starting point to compare my
final results to but my study aims to be as context-free as possible by attaching no
preconceived notions of music styles or genres instead looking at features that could
be measured in every song
While Spotifyrsquos approach to mapping music is very high-tech and based on ex-
isting genres Pandora takes a very low-tech and context-free approach to music
clustering Pandora created the Music Genome Project a multi-year undertaking
where skilled music theorists listened to a large number of songs and analyzed up to
450 characteristics in each song [6] Pandorarsquos approach is appealing to the aim of
my study since it does not take any preconceived notions of what a genre of music
is instead comparing songs on common characteristics such as pitch rhythm and
instrument patterns Unfortunately I do not have a cadre of skilled music theorists
at my disposal nor do I have 10 years to perform such calculations like the dedicated
workers at Pandora (tips the indestructible fedora) Additionally Pandorarsquos Music
Genome Project is intellectual property so at best I can only rely on the abstract
concepts of the Music Genome Project to drive my study
In the academic realm there are no existing studies analyzing quantifiable changes in
EM specifically but there exist a few studies that perform such analysis on popular
Western music in general One such study is Measuring the Evolution of Contem-
porary Western Popular Music which analyzes music from 1955-2010 spanning all
common genres Using the Million Song Dataset a free public database of songs
each containing metadata (see section 13) the study focuses on the attributes pitch
timbre and loudness Pitch is defined as the standard musical notes or frequency of
5
the sound waves Timbre is formally defined as the Mel frequency cepstral coefficients
(MFCC) of a transformed sound signal More informally it refers to the sound color
texture or tone quality and is associated with instrument types recording resources
and production techniques In other words two sounds that have the same pitch
but different tones (for example a bell and voice) are differentiated by their timbres
There are 12 MFCCs that define the timbre of a given sound Finally loudness
refers to intrinsically how loud the music sounds not loudness that a listener can
manipulate while listening to the music Loudness is the first MFCC of the timbre
of a sound [7] The study concluded that over time music has been becoming louder
and less diverse
The restriction of pitch sequences (with metrics showing less variety inpitch progressions) the homogenization of the timbral palette (with fre-quent timbres becoming more frequent) and growing average loudnesslevels (threatening a dynamic richness that has been conserved until to-day) This suggests that our perception of the new would be essentiallyrooted on identifying simpler pitch sequences fashionable timbral mix-tures and louder volumes Hence an old tune with slightly simpler chordprogressions new instrument sonorities that were in agreement with cur-rent tendencies and recorded with modern techniques that allowed forincreased loudness levels could be easily perceived as novel fashionableand groundbreaking
This study serves as a good starting point for mathematically analyzing music in
a few ways First it utilizes the Million Song Dataset which addresses the issue
of legally obtaining music metadata As mentioned in section 13 the only legal
way to obtain playable music for this study would have been to purchase all songs I
would include which is infeasible While the Million Song Dataset does not contain
the audio files in playable format it does contain audio features and metadata that
allow for in-depth analysis In addition working with the dataset takes out the
work of extracting features from raw audio files saving an extensive amount of time
and energy Second the study establishes specifics for what constitutes a trend
in music Pitch timbre and loudness are core features of music and examining the6
distributions of each among songs over time reveals a lot of information about how
the music industry and consumersrsquo tastes have evolved While these are not all of the
features contained in a song they serve as a good starting point Third the study
defines mathematical ways to capture music attributes and measure their change
over time For example pitches are transposed into the same tonal context with
binary discretized pitch descriptions based on a threshold so that each song can be
represented with vectors of pitches that are normalized and compared to other songs
While this study lays some solid groundwork for capturing and analyzing nu-
meric qualities of music it falls short of addressing my goals in a couple of ways
First it does not perform any analysis with respect to music genre While the
analysis performed in this paper could easily be applied to a list of songs in a specific
genre certain genres might have unique sounds and rhythms relative to other genres
that would be worth studying in greater detail Second the study only measures
general trends in music over time The models used to describe changes are simple
regressions that donrsquot look at more nuanced changes For example what styles of
music developed over certain periods of time How rapid were those changes Which
styles of music developed from which other styles
A more promising study led by music researcher Matthias Mauch [8] analyzes
contemporary popular Western Music from the 1960s to 2010s by comparing numer-
ical data on the pitches and timbre of a corpus of 17000 songs that appeared on the
Billboard Hot 100 Like the previously mentioned paper Measuring the Evolution
of Contemporary Western Popular Music Mauchrsquos study also creates abstractions
of pitch and timbre in order to provide a consistent and meaningful semantic inter-
pretation of musical data (see figure 12) However Mauchrsquos study takes this idea a
step further by using genre tags from Lastfm a music website and constructing a
7
hierarchy of music genres using hierarchical clustering Additionally the study takes
a crack at determining whether a particular band the Beatles was musically ground-
breaking for its time or merely playing off sounds that other bands had already used
Figure 12 Data processing pipeline for Mauchrsquos study illustrated with a segment ofQueenrsquos Bohemian Rhapsody 1975
While both Measuring the Evolution of Contemporary Western Popular Music
and Mauchrsquos study created abstractions of pitch and timbre Mauchrsquos study is more
appealing with respect to my goal because its end results align more closely with
mine Additionally the data processing pipeline offers several layers of abstraction
8
and depending on my progress I would be able to achieve at least one of the levels of
abstraction As shown in figure 12 each segment of a raw audio file is first broken
down into its 12 timbre MFCCs and pitch components Next the study constructs
ldquolexiconsrdquo or a dictionary of pitch and timbre terms that all songs can be compared
to For pitch the original data is in a N-by-12 matrix where N is the number of time
segments in the song and 12 the number of each of the notes found in an octave of
pitches Each time segment contains the relative strengths of each of the 12 pitches
However music sounds are not merely a collection of pitches but more precisely
chords Furthermore the similarity of two songs is not determined by the absolute
pitches of their chords but rather the progression of chords in the song all relative to
each other For example if all the notes in a song are transposed by one step the song
will sound different in terms of absolute pitch but the song will still be recognized
as the original because all of the relative movements from each chord to the next
are the same This phenomenon is captured in the pitch data by finding the most
likely chord played at each time segment then counting the change to the next chord
at each time step and generating a table of chord change frequencies for each song
Constructing the timbre lexcion is more complicated since there is no easy analogue
like chords for pitches to compare songs Mauchrsquos study utilizes a Gaussian Mixture
Model (GMM) by iterating over k=1 to k=N clusters where N is a large number
running the GMM on each prior assumption of k clusters and computing the Bayes
Information Criterion (BIC) for each model The lowest of the N BIC values is found
and that value of k is selected That model contains k different timbre clusters
and each cluster contains the mean timbre value for each of the 12 timbre components
For my research I decided that the pitch and timbre lexicons would be the most
realistic level of abstraction I could obtain Mauchrsquos study adds an addtional layer
to pitch and timbre by identifying the most common patterns of chord changes and
9
most common timbre rhythms and creating more general tags from these combined
terms such as ldquo stepwise changes indicating modal harmonyrdquo for a pitch topic and
ldquooh rounded mellowrdquo for a timbral topic There were two problems with using this
final layer of abstraction for my study First attaching semantic interpretations to
the pitch and timbral lexicons is a difficult task For timbre I would need to listen
to sound samples containing all of the different timbral categories I identified and
attaching user interpretations to them For the chords not only would I have to
perform the same analysis as on timbre but take careful attention to identify which
chords correspond to common sound progressions in popular music a task that I am
not qualified for an did not have the resources for this thesis to seek out Second
this final layer of abstraction was not necessary for the end goal of my paper In
fact consolidating my pitch and timbre lexicons into simpler phrases would run the
risk of pigeonholing my analysis and preventing me from discovering more nuanced
patterns in my final results Therefore I decided to focus on pitch and timbral
lexicon construction as the furthest levels of abstraction when processing songs for
my thesis Mathematical details on how I constructed the lexical and timbral lexicons
can be found in the Mathematical Modeling section of this paper
13 The Dataset
In order to successfully execute my thesis I need access to an extensive database of
music Until recently acquiring a substantial corpus of music data was a difficult and
costly task It is illegal to download music audio files from video and music-sharing
sites such as YouTube Spotify and Pandora Some platforms such as iTunes offer
90-second previews of songs but using only segments of songs and usually segments
that showcase the chorus of the song are not reliable measures to capture the entire
essence of a song Even if I were to legally download entire audio files for free I would
10
run into additional issues Obtaining a high-quality corpora of song data would be
challenging writing scripts that crawl music sharing platforms may not capture all of
the music I am looking for And once I have the audio files I would have to perform
audio processing techniques to extract the relevant information from the songs
Fortunately there is an easy solution to the music data acquisition problem
The Million Song Dataset (MSD) is a collection of metadata for one million music
tracks dating up to 2011 Various organizations such as The Echo Nest Musicbrainz
7digital and Lastfm have contributed different pieces of metadata Each song is
represented as a Hierarchical Data Format file (HDF5) which can be loaded as a
JSON object The fields encompass topical features such as the song title artist
and release date as well as lower-level features such as the loudness starting beat
time pitches and timbre of several segments of the song [9] While the MSD is
the largest free and open source music metadata dataset I could find there is no
guarantee that it adequately covers the entire spectrum of EM artists and songs
This quality limitation is important to consider throughout the study A quick look
through the songs including the subset of data I worked with for this report showed
that there were several well-known artists and songs in the EM scene Therefore
while the MSD may not contain all desired songs for this project it contains an
adequate number of relevant songs to produce some meaningful results Additionally
laying the groundwork for modeling the similarities between songs and identifying
groundbreaking ones is the same regardless of the songs included and the following
methodologies can be implemented on any similarly-formatted dataset including one
with songs that might currently be missing in the MSD
11
Chapter 2
Mathematical Modeling
21 Determining Novelty of Songs
Finding an logical and implementable mathematical model was and continues to be
an important aspect of my research My problem how to mathematically determine
which songs were unique for their time requires an algorithm in which each song is
introduced in chronological order either joining an existing category or starting a
new category based on its musical similarity to songs already introduced Clustering
algorithms like k-means or Gaussian Mixture Models (GMM) which have a prede-
termined number of clusters and optimize the partitioning of a dataset into those
cluster assume a fixed number of clusters While this process would work if we knew
exactly how many genres of EM existed if we guess wrong our end results may end
up with clusters that are wrongly grouped together or separated It is much better to
apply a clustering algorithm that does not make any assumptions about this number
One particularly promising process that addresses the issue of tthe number of
clusters is a family of algorithms known as Dirichlet Proccesses (DPs) DPs are
useful for this particular application because (1) they assign clusters to a dataset
12
with only an upper bound on the number of clusters and (2) by sorting the songs
in chronological order before running the algorithm and keeping track of which
songs are categorized under each cluster we can observe the earliest songs in each
cluster and consequentially infer which songs were responsible for creating new
clusters The arguments for the DP The DP is controlled by a parameter α which
is the concentration parameter The expected number of clusters formed is directly
proportional to the value of α so the higher the value of α the more likely new
clusters will be formed [10] Regardless of the value of α as the number of data
points introduced increases the probability of a new group being formed decreases
That is a ldquorich get richerrdquo policy is in place and existing clusters tend to grow in
size Tweaking the value of the tunable parameter α is an important part of the
study since it determines the flexibility given to forming a new cluster If the value
of α is too small then the criteria for forming clusters will be too strict and data
that should be in different clusters will be assigned to the same cluster On the other
hand if α is too large the algorithm will be too sensitive and assign similar songs to
different clusters
The implementation of the DP was achieved using scikit-learnrsquos library and API for
Dirichlet Process Gaussian Mixture Model (DPGMM) The DPGMM is the formal
name of the Dirichlet Process model used to cluster the data More specifically
scikit-learnrsquos implementation of the DPGMM uses the Stick Breaking method
one of several equally valid methods to assign songs to clusters [11] While the
mathematical details for this algorithm can be found at the following citation [12]
the most important aspects of the DPGMM are the arguments that the user can
specify and tune The first of these tunable parameters is the value α which is the
same parameter as the α discussed in the previous paragraph As seen in Figure 21
on the right side properly tuning α is key to obtaining meaningful clusters The
13
center image has α set to 001 which is too small and results in all of the data being
formed under one cluster On the other hand the bottom-right image has the same
data set and α set to 100 which does a better job of clustering On a related note
the figure also demonstrates the effectiveness of the DPGMM over the GMM On the
left side clearly the dataset contains 2 clusters but the GMM on the top-left image
assumes 5 clusters as a prior and consequentially clusters the data incorrectly while
the DPGMM manages to limit the data to 2 clusters
The second argument that the user inputs for the DPGMM is the data that
will be clustered The scikit-learn implementation takes the data in the format
of a nested list (N lists each of length m) where N is the number of data points
and m the number of features While the format of the data structure is relatively
straightforward choosing which numbers should be in the data was a challenge I
faced Selecting the relevant features of each song to be used in the algorithm will
be expounded upon in the next section ldquoFeature Selectionrdquo
The last argument that a user inputs for the scikit-learn DGPMM implementa-
tion is an argument indicating the upper bound for the number of clusters The
Dirichlet Process then determines the best number of clusters for the data between
1 and the upper bound Since the DPGMM is flexible enough to find the best value
I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to
modify the number of clusters formed
22 Feature Selection
One of the most difficult aspects of the Dirichlet method is choosing the features
to be used for clustering In other words when we organize the songs into clusters
14
Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α
we need to ensure that each cluster is distinct in a way that is statistically and
intuitively logical In the Million Song Dataset [9] each song is represented as a
JSON object containing several fields These fields are candidate features to be used
in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo
by Rick Astley and the corresponding features
artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID
for this artists is db9)
artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)
artist_mbtags_count shape = (4)
(raw tag count of the 4 tags this artist received on musicbrainzorg)
artist_name Rick Astley (artist name)
artist_playmeid 1338 (the ID of that artist on the service playmecom)
artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)
artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest
(number between 0 and 1))
artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest
(number between 0 and 1))
15
audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for
the analysis by The Echo Nest)
bars_confidence shape = (99) (confidence value (between 0 and 1) associated
with each bar by The Echo Nest)
bars_start shape = (99) (start time of each bar according to The Echo Nest this
song has 99 bars)
beats_confidence shape = (397))
confidence value (between 0 and 1) associated with each beat by The Echo Nest
beats_start shape = (397) (start time of each beat according to The Echo Nest
this song has 397 beats)
danceability 00 (danceability measure of this song according to The Echo Nest
(between 0 and 1 0 =gt not analyzed))
duration 21169587 (duration of the track in seconds)
end_of_fade_in 0139 (time of the end of the fade in at the beginning of the
song according to The Echo Nest)
energy 00 (energy measure (not in the signal processing sense) according to The
Echo Nest (between 0 and 1 0 = not analyzed))
key 1 (estimation of the key the song is in by The Echo Nest)
key_confidence 0324 (confidence of the key estimation)
loudness -775 (general loudness of the track)
mode 1 (estimation of the mode the song is in by The Echo Nest)
mode_confidence 0434 (confidence of the mode estimation)
release Big Tunes - Back 2 The 80s (album name from which the track was taken
some songs tracks can come from many albums we give only one)
release_7digitalid 786795 (the ID of the release (album) on the service 7digi-
talcom)
sections_confidence shape = (10) (confidence value (between 0 and 1) associated
16
with each section by The Echo Nest)
sections_start shape = (10) (start time of each section according to The Echo
Nest this song has 10 sections)
segments_confidence shape = (935) (confidence value (between 0 and 1) asso-
ciated with each segment by The Echo Nest)
segments_loudness_max shape = (935) (max loudness during each segment)
segments_loudness_max_time shape = (935) (time of the max loudness
during each segment)
segments_loudness_start shape = (935) (loudness at the beginning of each
segment)
segments_pitches shape = (935 12) (chroma features for each segment (normal-
ized so max is 1))
segments_start shape = (935) (start time of each segment ( musical event or
onset) according to The Echo Nest this song has 935 segments)
segments_timbre shape = (935 12) (MFCC-like features for each segment)
similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar
to Rick Astley according to The Echo Nest)
song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded
(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))
song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can
be associated with many tracks (with very slight audio differences))
start_of _fade _out 198536 (start time of the fade out in seconds at the end
of the song according to The Echo Nest)
tatums_confidence shape = (794) (confidence value (between 0 and 1) associated
with each tatum by The Echo Nest)
tatums_start shape = (794) (start time of each tatum according to The Echo
Nest this song has 794 tatums)
17
tempo 113359 (tempo in BPM according to The Echo Nest)
time_signature 4 (time signature of the song according to The Echo Nest ie
usual number of beats per bar)
time_signature_confidence 0634 (confidence of the time signature estimation)
title Never Gonna Give You Up (song title)
track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)
track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track
on which the analysis was done) year 1987 (year when this song was released
according to musicbrainzorg)
When choosing features my main goal was to use features that would most
likely yield meaningful results yet also be simple and make sense to the average
person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener
will have his or her opinions to what constitutes different types of music but some
common features most people tend to differentiate songs by are pitch rhythm and
the types of instruments used The following specific fields provided in each song
object fall under these three terms
Pitch
bull segments_pitches a matrix of values indicating the strength of each pitch (or
note) at each discernible time interval
Rhythm
bull beats_start a vector of values indicating the start time of each beat
bull time_signature the time signature of the song
bull tempo the speed of the song in Beats Per Minute (BPM)
Instruments
18
bull segments_timbre a matrix of values indicating the distribution of MFCC-like
features (different types of tones) for each segments
The segments_pitches feature is a clear candidate for a differentiating factor for
songs since it reveals patterns of notes that occur Additionally other research
papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a
procedure that allows all songs to be compared with the same metric Likewise
timbre is intuitively a reliable differentiating feature since it reveals the amount
that different tones or sounds that sound different despite having the same pitch
Therefore segments_timbre is another feature that is considered in each song
Finally we look at the candidate features for rhythm At first glance all of these
features appear to be useful as they indicate the rhythm of a song in one way or
another However none of these features are as useful as the pitch and timbre
features While tempo is one factor in differentiating genres of EDM and music in
general tempo alone is not a driving force of musical innovation Certain genres
of EDM like drum nrsquo bass and happycore stand out for having very fast tempos
but the tempo is supplemented with a sound unique to the genre Conceiving new
arrangements of pitches combining instruments in new ways and inventing new
types of sounds are novel but speeding up or slowing down existing sounds is not
Including tempo as a feature could actually add noise to the model since many genres
overlap in their tempos And finally tempo is measured indirectly when the pitch
and timbre features are normalized for each song everything is measured in units of
ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features
each second Time signature can be dismissed from the candidates features for the
same reason as tempo many genres contain the same time signature and including
it in the feature set would only add more noise beats_start looks like a more
promising feature since like segments_pitches and segments_timbre it consists of
a vector of values However difficulties arise when we begin to think how exactly
19
we can utilize this information Since each song varies in length we need a way to
compare songs of different durations on the same level One approach could be to
perform basic statistics on the distance between each beat for example calculating
the mean and standard deviation of this distance However the normalized pitch
and timbre information already capture this data Another possibility is detecting
certain patterns of beats which could differentiate the syncopated dubstep or glitch
music beats from the steady pulse of electro-house But once again every beat is
accompanied by a sound with a specific timbre and pitch so this feature would not
add any significantly new information
23 Collecting Data and Preprocessing Selected Fea-
tures
231 Collecting the Data
Upon deciding the features I wanted to use in my research I first needed to collect
all of the electronic songs in the Million Song Dataset The easiest reliable way to
achieve this was to iterate through each song in the database and save the information
for the songs where any of the artist genre tags in artist_mbtags matched with an
electronic music genre While this measure was not fully accurate because it looks at
the genre of the artist not the song specific genre information for each song was not
as easily accessible so this indicator was nearly as good a substitute To generate a
list of the genres that electronic songs would fall under I manually searched through
a subset of the MSD to find all genres that seemed to be releated to electronic music
In the case of genres that were sometimes but not always electronic in nature such
20
as disco or pop I erred on the side of caution and did not include them in the list
of electronic genres In these cases false positives such as primarily rock songs that
happen to have the disco label attached to the artist could inadvertantly be included
in the dataset The final list of genres is as follows
target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo
rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo
rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo
rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
rsquodance and electronicarsquorsquoelectronicrsquo]
232 Pitch Preprocessing
A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-
cally informed manner The study first takes the raw sound data and converts it into
a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest
amount Then it computes the most likely chord by comparing comparing the 4 most
common types of chords in popular music (major minor dominant 7 and minor 7) to
the observed chord The most common chords are represented as ldquotemplate chordsrdquo
and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For
example using the note C as the first index the C major chord is represented as
CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)
For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is
computed over every template chord
ρCTc =12sumi=1
(CT minus CTi)(ci minus c)σCTσc
21
where CT is the mean of the values in the template chord σCT is the standard
deviation of the values in the chord and the operations on c are analogous Note
that the summation is over each individual pitch in the 12 pitch classes The chord
template with the highest value of ρ is selected as the chord for the time frame
After this is performed for each time frame the values are smoothed and then the
change between adjacent chords is observed The reasoning behind this step is that
by measuring the relative distance between chords rather than the chords themselves
all songs can be compared in the same manner even though they may have different
key signatures Finally the study takes the types of chord changes and classifies
them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted
versions of the chord changes that make more sense to a human such as ldquochanges
involving dominant 7th chordsrdquo
In my preliminary implementation of this method on an electronic dance music
corpus I made a few modifications to Mauchrsquos study First I smoothed out time
frames before computing the most probable chords rather than smoothing the most
probable chords I did this to save time and to reduce volatility in the chord
measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference
which contains 935 time frames and lasts 212 seconds 5 time frames is slightly
under 1 second and for preliminary testing appeared to be a good interval for each
time block Second as mentioned in the literature section I did not abstract the
chord changes into H-topics This decision also stemmed from time constraints since
deriving semantic chord meaning from EDM songs would require careful research
into the types of harmonies and sounds common in that genre of music Below I
included a high-level visualization of the pitch metadata found in a sample song
ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change
vector that I could then feed into the Dirichlet Process algorithm
22
Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813
CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513
DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913
FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713
GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213
AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513
13 13 13 13
060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013
13 13 13 13 13 13 13 13 13 13 13 13
13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13
Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13
Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13
Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13
Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13
23
13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13
13
13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13
Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13
24
A final step I took to normalize the chord change data was to divide the numbers by
the length of the song so that each songrsquos number of chord changes was measured per
second
233 Timbre Preprocessing
For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre
uniformly across all songs [8] After collecting all song metadata I took a random
sample of 20 songs from each year starting at 1970 The reason I forced the sampling
to 20 randomly sampled songs from each year and did not take a random sample of
songs from all years at once was to prevent bias towards any type of sounds As seen
in figure 22 there are significantly more songs from 2000-2011 than before 2000 The
mean year is x = 2001052 the median year is 2003 and the standard deviation of the
years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include
a disproportionate amount of more recent songs In order to not miss out on sounds
that may be more prevalent in older songs I required a set number of songs from each
year Next from each randomly selected song I selected 20 random timbre frames
in order to prevent any biases in data collection within each song In total there
were 422020 = 16800 timbre frames collected Next I clustered the timbre frames
using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to
100 and selecting the number of clusters with the lowest Bayes Information Criterion
(BIC) a statistical measure commonly used to calculate the best fitting model The
BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters
and saved the mean values of each of the 12 timbre segments for each cluster formed
In the same way that every song had the same 192 cord changes whose frequencies
could be compared between songs each song now had the same 46 timbre clusters
but different frequencies in each song When reading in the metadata from each song
I calculated the most likely timbre cluster each timbre frame belonged to and kept
25
Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear
a frequency count of all of the possible timbre clusters observed in a song Finally
as with the pitch data I divided all observed counts by the duration of the song in
order to normalize each songrsquos timbre counts
26
Chapter 3
Results
31 Methodology
After the pitch and timbre data was processed I ran the Dirichlet Process on the
data For each song I concatenated the 192-element chord change frequency list
and the 46-element timbre category frequency list giving each song a total of 238
features However there is a problem with this setup The pitch data will inherently
dominate the clustering process since it contains almost 3 times as many features
as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to
give different weights to each feature I considered another possibility to remedy
this discrepancy duplicating the timbre vector a certain number of times and
concatenating that to the feature set of each song While this strategy runs the risk
of corrupting the feature set and turning it into something that does not accurately
represent each song it is important to keep in mind that even without duplicating
the timbre vector the feature set consists of two separate feature sets concatenated
to each other Therefore timbre duplication appears to be a reasonable strategy to
weigh pitch and timbre more evenly
27
After this modification I tweaked a few more parameters before obtaining my
final results Dividing the pitch and timbre frequencies by the duration of the song
normalized every song to frequency per second but it also had the undesired effect
of making the data too small Timbre and pitch frequencies per second were almost
always less than 10 and many times hovered as low as 0002 for nonzero values
Because all of the values were very close to each other using common values of α
in the range of 01 to 1000-2000 was insufficient to push the songs into different
clusters As a result every song fell into the same cluster Increasing the value
of α by several orders of magnitude to well over 10 million fixed the problem but
this solution presented two problems First tuning α to experiment with different
ways to cluster the music would be problematic since I would have to work with
an enormous range of possible values for α Second pushing α to such high values
is not appropriate for the Dirichlet Process Extremely high values of α indicate a
Dirichlet Process that will try to disperse the data into different clusters but a value
of α that high is in principle always assigning each new song to a new cluster On
the other hand varying α between 01 and 1000 for example presents a much wider
range of flexibility when assigning clusters While this may be possible by varying
the values of α an extreme amount with the data as it currently is we are using
the Dirichlet Process in a way it should mathematically not be used Therefore
multiplying all of the data by a constant value so that we can work in the appropriate
range of α is the ideal approach After some experimentation I found that k=10 was
an appropriate scaling factor After initial runs of the Dirichlet Process I found out
that there was a slight issue with some of the earlier songs Since I had only artist
genre tags not specific song tags for each song I chose songs based on whether any
of the tags associated with the artist fell under any electronic music genre including
the generic term rsquoelectronicrsquo There were some bands mostly older ones from the
1960s and 1970s like Electric Light Orchestra which had some electronic music but
28
mostly featured rock funk disco or another genre Given that these artists featured
mostly non-electronic songs I decided to exclude them from my study and generate
a blacklist indicating these music artists While it was infeasible to look through
every single song and determine whether it was electronic or not I was able to look
over the earliest songs in each cluster These songs were the most important to verify
as electronic because early non-electronic songs could end up forming new clusters
and inadvertently create clusters with non-electronic sounds that I was not looking for
The goal of this thesis is to identify different groups in which EM songs are
clustered and identify the most unique artists and genres While the second task is
very simple because it requires looking at the earliest songs in each cluster the first
is difficult to gauge the effectiveness of While I can look at the average chord change
and timbre category frequencies in each category as well as other metadata putting
more semantic interpretations to what the music actually sounds like and determining
whether the music is clustered properly is a very subjective process For this reason
I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and
compared the clustering in each category examining similarities and differences in
the clusters formed in each scenario in the Discussion section For each value α I
set the upper limit of components or clusters allowed to 50 The ranges of α I used
resulted in 9 14 and 19 clusters formed
32 Findings
321 α=005
When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are
the distribution of years of the songs in each cluster (note that the Dirichlet Process
29
does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are
skipped)
30
Figure 31 Song year distributions for α = 005
For each value of α I also calculated the average frequency of each chord change
category and timbre category for each cluster and plotted the results The green
lines correspond to timbre and the blue lines to pitch
31
Figure 32 Timbre and pitch distributions for α = 005
A table of each cluster formed the number of songs in that cluster and descriptions
of pitch timbre and rhythmic qualities characteristic of songs in that cluster are
shown below
32
Cluster Song Count Characteristic Sounds
0 6481 Minimalist industrial space sounds dissonant chords
1 5482 Soft New Age ethereal
2 2405 Defined sounds electronic and non-electronic instru-
ments played in standard rock rhythms
3 360 Very dense and complex synths slightly darker tone
4 4550 Heavily distorted rock and synthesizer
6 2854 Faster paced 80s synth rock acid house
8 798 Aggressive beats dense house music
9 1464 Ambient house trancelike strong beats mysterious
tone
11 1597 Melancholy tones New wave rock in 80s then starting
in 90s downtempo trip-hop nu-metal
Table 31 Song cluster descriptions for α = 005
322 α=01
A total of 14 clusters were formed (16 were formed but 2 clusters contained only
one song each I listened to both of these songs and they did not sound unique so
I discarded them from the clusters) Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
33
34
Figure 33 Song year distributions for α = 01
35
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
model is history-agnostic and may not realize the historical context of songs when
clustering However I believe that there is still significant merit to my research
Instead of classifying genres of music by early genres that led to them my approach
gives the most credit to the artists and songs that were the most innovative for their
time and perhaps reveal different musical styles that are more similar to each other
than history would otherwise imply This way of thinking of music genres while
unconventional is another way of imagining EM
The practice of quantitatively analyzing music has exploded in the last decade
thanks to technological and algorithmic advances that allow data scientists to con-
structively sift through troves of music and listener information In the literature
review I will focus on two particular organizations that have contributed greatly to
the large-scale mathematical analysis of music Pandora a website that plays songs
similar to a songartistalbum inputted by a user and Echo Nest a music analytics
firm that was acquired by Spotify in 2014 and drives Spotifyrsquos Discover Weekly
feature [3] After evaluating the relevance of these sources to my thesis work I will
then look over the relevant academic research and evaluate what this research can
contribute
12 Literature Review
The analysis of quantitative music generally falls into two categories research con-
ducted by academics and academic organizations for scholarly purposes and research
conducted by companies and primarily targeted for consumers First looking at the
consumer-based research Spotify and Pandora are two of the most prominent based
groups and the two I decided to focus on Spotify is a music streaming service where
users can listen to albums and songs from a wide variety of artists or listen to weekly
3
playlists generated based on the music the user and userrsquos friends have listened to
The weekly playlist called Discover Weekly Playlist is a relatively new feature in
Spotify and is driven by music analysis algorithms created from Echo Nest Using
the Echo Nest code interface Spotify creates a ldquotaste profilerdquo for each user which
assesses attributes such as how often a user branches out to new styles of music how
closely the userrsquos music streamed follows popular Billboard music charts and so on
Spotify also looks at the artists and songs the user streamed and creates clusters
of different genres that the user likes (see figure 11) The taste profile and music
clusters can then be used to generate playlists geared to a specific user The genres
in the cluster come from a list of nearly 800 names which are derived by scraping
the Internet for trending terms in music as well as training various algorithms on a
regular basic by ldquolisteningrdquo to new songs [4][5]
Figure 11 A userrsquos taste profile generated by Spotify
4
Although Spotify and Echo Nestrsquos algorithms are very useful for mapping the land-
scape of established and emerging genres of music the methodology is limited to
pre-defined genres of music This may serve as a good starting point to compare my
final results to but my study aims to be as context-free as possible by attaching no
preconceived notions of music styles or genres instead looking at features that could
be measured in every song
While Spotifyrsquos approach to mapping music is very high-tech and based on ex-
isting genres Pandora takes a very low-tech and context-free approach to music
clustering Pandora created the Music Genome Project a multi-year undertaking
where skilled music theorists listened to a large number of songs and analyzed up to
450 characteristics in each song [6] Pandorarsquos approach is appealing to the aim of
my study since it does not take any preconceived notions of what a genre of music
is instead comparing songs on common characteristics such as pitch rhythm and
instrument patterns Unfortunately I do not have a cadre of skilled music theorists
at my disposal nor do I have 10 years to perform such calculations like the dedicated
workers at Pandora (tips the indestructible fedora) Additionally Pandorarsquos Music
Genome Project is intellectual property so at best I can only rely on the abstract
concepts of the Music Genome Project to drive my study
In the academic realm there are no existing studies analyzing quantifiable changes in
EM specifically but there exist a few studies that perform such analysis on popular
Western music in general One such study is Measuring the Evolution of Contem-
porary Western Popular Music which analyzes music from 1955-2010 spanning all
common genres Using the Million Song Dataset a free public database of songs
each containing metadata (see section 13) the study focuses on the attributes pitch
timbre and loudness Pitch is defined as the standard musical notes or frequency of
5
the sound waves Timbre is formally defined as the Mel frequency cepstral coefficients
(MFCC) of a transformed sound signal More informally it refers to the sound color
texture or tone quality and is associated with instrument types recording resources
and production techniques In other words two sounds that have the same pitch
but different tones (for example a bell and voice) are differentiated by their timbres
There are 12 MFCCs that define the timbre of a given sound Finally loudness
refers to intrinsically how loud the music sounds not loudness that a listener can
manipulate while listening to the music Loudness is the first MFCC of the timbre
of a sound [7] The study concluded that over time music has been becoming louder
and less diverse
The restriction of pitch sequences (with metrics showing less variety inpitch progressions) the homogenization of the timbral palette (with fre-quent timbres becoming more frequent) and growing average loudnesslevels (threatening a dynamic richness that has been conserved until to-day) This suggests that our perception of the new would be essentiallyrooted on identifying simpler pitch sequences fashionable timbral mix-tures and louder volumes Hence an old tune with slightly simpler chordprogressions new instrument sonorities that were in agreement with cur-rent tendencies and recorded with modern techniques that allowed forincreased loudness levels could be easily perceived as novel fashionableand groundbreaking
This study serves as a good starting point for mathematically analyzing music in
a few ways First it utilizes the Million Song Dataset which addresses the issue
of legally obtaining music metadata As mentioned in section 13 the only legal
way to obtain playable music for this study would have been to purchase all songs I
would include which is infeasible While the Million Song Dataset does not contain
the audio files in playable format it does contain audio features and metadata that
allow for in-depth analysis In addition working with the dataset takes out the
work of extracting features from raw audio files saving an extensive amount of time
and energy Second the study establishes specifics for what constitutes a trend
in music Pitch timbre and loudness are core features of music and examining the6
distributions of each among songs over time reveals a lot of information about how
the music industry and consumersrsquo tastes have evolved While these are not all of the
features contained in a song they serve as a good starting point Third the study
defines mathematical ways to capture music attributes and measure their change
over time For example pitches are transposed into the same tonal context with
binary discretized pitch descriptions based on a threshold so that each song can be
represented with vectors of pitches that are normalized and compared to other songs
While this study lays some solid groundwork for capturing and analyzing nu-
meric qualities of music it falls short of addressing my goals in a couple of ways
First it does not perform any analysis with respect to music genre While the
analysis performed in this paper could easily be applied to a list of songs in a specific
genre certain genres might have unique sounds and rhythms relative to other genres
that would be worth studying in greater detail Second the study only measures
general trends in music over time The models used to describe changes are simple
regressions that donrsquot look at more nuanced changes For example what styles of
music developed over certain periods of time How rapid were those changes Which
styles of music developed from which other styles
A more promising study led by music researcher Matthias Mauch [8] analyzes
contemporary popular Western Music from the 1960s to 2010s by comparing numer-
ical data on the pitches and timbre of a corpus of 17000 songs that appeared on the
Billboard Hot 100 Like the previously mentioned paper Measuring the Evolution
of Contemporary Western Popular Music Mauchrsquos study also creates abstractions
of pitch and timbre in order to provide a consistent and meaningful semantic inter-
pretation of musical data (see figure 12) However Mauchrsquos study takes this idea a
step further by using genre tags from Lastfm a music website and constructing a
7
hierarchy of music genres using hierarchical clustering Additionally the study takes
a crack at determining whether a particular band the Beatles was musically ground-
breaking for its time or merely playing off sounds that other bands had already used
Figure 12 Data processing pipeline for Mauchrsquos study illustrated with a segment ofQueenrsquos Bohemian Rhapsody 1975
While both Measuring the Evolution of Contemporary Western Popular Music
and Mauchrsquos study created abstractions of pitch and timbre Mauchrsquos study is more
appealing with respect to my goal because its end results align more closely with
mine Additionally the data processing pipeline offers several layers of abstraction
8
and depending on my progress I would be able to achieve at least one of the levels of
abstraction As shown in figure 12 each segment of a raw audio file is first broken
down into its 12 timbre MFCCs and pitch components Next the study constructs
ldquolexiconsrdquo or a dictionary of pitch and timbre terms that all songs can be compared
to For pitch the original data is in a N-by-12 matrix where N is the number of time
segments in the song and 12 the number of each of the notes found in an octave of
pitches Each time segment contains the relative strengths of each of the 12 pitches
However music sounds are not merely a collection of pitches but more precisely
chords Furthermore the similarity of two songs is not determined by the absolute
pitches of their chords but rather the progression of chords in the song all relative to
each other For example if all the notes in a song are transposed by one step the song
will sound different in terms of absolute pitch but the song will still be recognized
as the original because all of the relative movements from each chord to the next
are the same This phenomenon is captured in the pitch data by finding the most
likely chord played at each time segment then counting the change to the next chord
at each time step and generating a table of chord change frequencies for each song
Constructing the timbre lexcion is more complicated since there is no easy analogue
like chords for pitches to compare songs Mauchrsquos study utilizes a Gaussian Mixture
Model (GMM) by iterating over k=1 to k=N clusters where N is a large number
running the GMM on each prior assumption of k clusters and computing the Bayes
Information Criterion (BIC) for each model The lowest of the N BIC values is found
and that value of k is selected That model contains k different timbre clusters
and each cluster contains the mean timbre value for each of the 12 timbre components
For my research I decided that the pitch and timbre lexicons would be the most
realistic level of abstraction I could obtain Mauchrsquos study adds an addtional layer
to pitch and timbre by identifying the most common patterns of chord changes and
9
most common timbre rhythms and creating more general tags from these combined
terms such as ldquo stepwise changes indicating modal harmonyrdquo for a pitch topic and
ldquooh rounded mellowrdquo for a timbral topic There were two problems with using this
final layer of abstraction for my study First attaching semantic interpretations to
the pitch and timbral lexicons is a difficult task For timbre I would need to listen
to sound samples containing all of the different timbral categories I identified and
attaching user interpretations to them For the chords not only would I have to
perform the same analysis as on timbre but take careful attention to identify which
chords correspond to common sound progressions in popular music a task that I am
not qualified for an did not have the resources for this thesis to seek out Second
this final layer of abstraction was not necessary for the end goal of my paper In
fact consolidating my pitch and timbre lexicons into simpler phrases would run the
risk of pigeonholing my analysis and preventing me from discovering more nuanced
patterns in my final results Therefore I decided to focus on pitch and timbral
lexicon construction as the furthest levels of abstraction when processing songs for
my thesis Mathematical details on how I constructed the lexical and timbral lexicons
can be found in the Mathematical Modeling section of this paper
13 The Dataset
In order to successfully execute my thesis I need access to an extensive database of
music Until recently acquiring a substantial corpus of music data was a difficult and
costly task It is illegal to download music audio files from video and music-sharing
sites such as YouTube Spotify and Pandora Some platforms such as iTunes offer
90-second previews of songs but using only segments of songs and usually segments
that showcase the chorus of the song are not reliable measures to capture the entire
essence of a song Even if I were to legally download entire audio files for free I would
10
run into additional issues Obtaining a high-quality corpora of song data would be
challenging writing scripts that crawl music sharing platforms may not capture all of
the music I am looking for And once I have the audio files I would have to perform
audio processing techniques to extract the relevant information from the songs
Fortunately there is an easy solution to the music data acquisition problem
The Million Song Dataset (MSD) is a collection of metadata for one million music
tracks dating up to 2011 Various organizations such as The Echo Nest Musicbrainz
7digital and Lastfm have contributed different pieces of metadata Each song is
represented as a Hierarchical Data Format file (HDF5) which can be loaded as a
JSON object The fields encompass topical features such as the song title artist
and release date as well as lower-level features such as the loudness starting beat
time pitches and timbre of several segments of the song [9] While the MSD is
the largest free and open source music metadata dataset I could find there is no
guarantee that it adequately covers the entire spectrum of EM artists and songs
This quality limitation is important to consider throughout the study A quick look
through the songs including the subset of data I worked with for this report showed
that there were several well-known artists and songs in the EM scene Therefore
while the MSD may not contain all desired songs for this project it contains an
adequate number of relevant songs to produce some meaningful results Additionally
laying the groundwork for modeling the similarities between songs and identifying
groundbreaking ones is the same regardless of the songs included and the following
methodologies can be implemented on any similarly-formatted dataset including one
with songs that might currently be missing in the MSD
11
Chapter 2
Mathematical Modeling
21 Determining Novelty of Songs
Finding an logical and implementable mathematical model was and continues to be
an important aspect of my research My problem how to mathematically determine
which songs were unique for their time requires an algorithm in which each song is
introduced in chronological order either joining an existing category or starting a
new category based on its musical similarity to songs already introduced Clustering
algorithms like k-means or Gaussian Mixture Models (GMM) which have a prede-
termined number of clusters and optimize the partitioning of a dataset into those
cluster assume a fixed number of clusters While this process would work if we knew
exactly how many genres of EM existed if we guess wrong our end results may end
up with clusters that are wrongly grouped together or separated It is much better to
apply a clustering algorithm that does not make any assumptions about this number
One particularly promising process that addresses the issue of tthe number of
clusters is a family of algorithms known as Dirichlet Proccesses (DPs) DPs are
useful for this particular application because (1) they assign clusters to a dataset
12
with only an upper bound on the number of clusters and (2) by sorting the songs
in chronological order before running the algorithm and keeping track of which
songs are categorized under each cluster we can observe the earliest songs in each
cluster and consequentially infer which songs were responsible for creating new
clusters The arguments for the DP The DP is controlled by a parameter α which
is the concentration parameter The expected number of clusters formed is directly
proportional to the value of α so the higher the value of α the more likely new
clusters will be formed [10] Regardless of the value of α as the number of data
points introduced increases the probability of a new group being formed decreases
That is a ldquorich get richerrdquo policy is in place and existing clusters tend to grow in
size Tweaking the value of the tunable parameter α is an important part of the
study since it determines the flexibility given to forming a new cluster If the value
of α is too small then the criteria for forming clusters will be too strict and data
that should be in different clusters will be assigned to the same cluster On the other
hand if α is too large the algorithm will be too sensitive and assign similar songs to
different clusters
The implementation of the DP was achieved using scikit-learnrsquos library and API for
Dirichlet Process Gaussian Mixture Model (DPGMM) The DPGMM is the formal
name of the Dirichlet Process model used to cluster the data More specifically
scikit-learnrsquos implementation of the DPGMM uses the Stick Breaking method
one of several equally valid methods to assign songs to clusters [11] While the
mathematical details for this algorithm can be found at the following citation [12]
the most important aspects of the DPGMM are the arguments that the user can
specify and tune The first of these tunable parameters is the value α which is the
same parameter as the α discussed in the previous paragraph As seen in Figure 21
on the right side properly tuning α is key to obtaining meaningful clusters The
13
center image has α set to 001 which is too small and results in all of the data being
formed under one cluster On the other hand the bottom-right image has the same
data set and α set to 100 which does a better job of clustering On a related note
the figure also demonstrates the effectiveness of the DPGMM over the GMM On the
left side clearly the dataset contains 2 clusters but the GMM on the top-left image
assumes 5 clusters as a prior and consequentially clusters the data incorrectly while
the DPGMM manages to limit the data to 2 clusters
The second argument that the user inputs for the DPGMM is the data that
will be clustered The scikit-learn implementation takes the data in the format
of a nested list (N lists each of length m) where N is the number of data points
and m the number of features While the format of the data structure is relatively
straightforward choosing which numbers should be in the data was a challenge I
faced Selecting the relevant features of each song to be used in the algorithm will
be expounded upon in the next section ldquoFeature Selectionrdquo
The last argument that a user inputs for the scikit-learn DGPMM implementa-
tion is an argument indicating the upper bound for the number of clusters The
Dirichlet Process then determines the best number of clusters for the data between
1 and the upper bound Since the DPGMM is flexible enough to find the best value
I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to
modify the number of clusters formed
22 Feature Selection
One of the most difficult aspects of the Dirichlet method is choosing the features
to be used for clustering In other words when we organize the songs into clusters
14
Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α
we need to ensure that each cluster is distinct in a way that is statistically and
intuitively logical In the Million Song Dataset [9] each song is represented as a
JSON object containing several fields These fields are candidate features to be used
in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo
by Rick Astley and the corresponding features
artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID
for this artists is db9)
artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)
artist_mbtags_count shape = (4)
(raw tag count of the 4 tags this artist received on musicbrainzorg)
artist_name Rick Astley (artist name)
artist_playmeid 1338 (the ID of that artist on the service playmecom)
artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)
artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest
(number between 0 and 1))
artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest
(number between 0 and 1))
15
audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for
the analysis by The Echo Nest)
bars_confidence shape = (99) (confidence value (between 0 and 1) associated
with each bar by The Echo Nest)
bars_start shape = (99) (start time of each bar according to The Echo Nest this
song has 99 bars)
beats_confidence shape = (397))
confidence value (between 0 and 1) associated with each beat by The Echo Nest
beats_start shape = (397) (start time of each beat according to The Echo Nest
this song has 397 beats)
danceability 00 (danceability measure of this song according to The Echo Nest
(between 0 and 1 0 =gt not analyzed))
duration 21169587 (duration of the track in seconds)
end_of_fade_in 0139 (time of the end of the fade in at the beginning of the
song according to The Echo Nest)
energy 00 (energy measure (not in the signal processing sense) according to The
Echo Nest (between 0 and 1 0 = not analyzed))
key 1 (estimation of the key the song is in by The Echo Nest)
key_confidence 0324 (confidence of the key estimation)
loudness -775 (general loudness of the track)
mode 1 (estimation of the mode the song is in by The Echo Nest)
mode_confidence 0434 (confidence of the mode estimation)
release Big Tunes - Back 2 The 80s (album name from which the track was taken
some songs tracks can come from many albums we give only one)
release_7digitalid 786795 (the ID of the release (album) on the service 7digi-
talcom)
sections_confidence shape = (10) (confidence value (between 0 and 1) associated
16
with each section by The Echo Nest)
sections_start shape = (10) (start time of each section according to The Echo
Nest this song has 10 sections)
segments_confidence shape = (935) (confidence value (between 0 and 1) asso-
ciated with each segment by The Echo Nest)
segments_loudness_max shape = (935) (max loudness during each segment)
segments_loudness_max_time shape = (935) (time of the max loudness
during each segment)
segments_loudness_start shape = (935) (loudness at the beginning of each
segment)
segments_pitches shape = (935 12) (chroma features for each segment (normal-
ized so max is 1))
segments_start shape = (935) (start time of each segment ( musical event or
onset) according to The Echo Nest this song has 935 segments)
segments_timbre shape = (935 12) (MFCC-like features for each segment)
similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar
to Rick Astley according to The Echo Nest)
song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded
(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))
song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can
be associated with many tracks (with very slight audio differences))
start_of _fade _out 198536 (start time of the fade out in seconds at the end
of the song according to The Echo Nest)
tatums_confidence shape = (794) (confidence value (between 0 and 1) associated
with each tatum by The Echo Nest)
tatums_start shape = (794) (start time of each tatum according to The Echo
Nest this song has 794 tatums)
17
tempo 113359 (tempo in BPM according to The Echo Nest)
time_signature 4 (time signature of the song according to The Echo Nest ie
usual number of beats per bar)
time_signature_confidence 0634 (confidence of the time signature estimation)
title Never Gonna Give You Up (song title)
track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)
track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track
on which the analysis was done) year 1987 (year when this song was released
according to musicbrainzorg)
When choosing features my main goal was to use features that would most
likely yield meaningful results yet also be simple and make sense to the average
person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener
will have his or her opinions to what constitutes different types of music but some
common features most people tend to differentiate songs by are pitch rhythm and
the types of instruments used The following specific fields provided in each song
object fall under these three terms
Pitch
bull segments_pitches a matrix of values indicating the strength of each pitch (or
note) at each discernible time interval
Rhythm
bull beats_start a vector of values indicating the start time of each beat
bull time_signature the time signature of the song
bull tempo the speed of the song in Beats Per Minute (BPM)
Instruments
18
bull segments_timbre a matrix of values indicating the distribution of MFCC-like
features (different types of tones) for each segments
The segments_pitches feature is a clear candidate for a differentiating factor for
songs since it reveals patterns of notes that occur Additionally other research
papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a
procedure that allows all songs to be compared with the same metric Likewise
timbre is intuitively a reliable differentiating feature since it reveals the amount
that different tones or sounds that sound different despite having the same pitch
Therefore segments_timbre is another feature that is considered in each song
Finally we look at the candidate features for rhythm At first glance all of these
features appear to be useful as they indicate the rhythm of a song in one way or
another However none of these features are as useful as the pitch and timbre
features While tempo is one factor in differentiating genres of EDM and music in
general tempo alone is not a driving force of musical innovation Certain genres
of EDM like drum nrsquo bass and happycore stand out for having very fast tempos
but the tempo is supplemented with a sound unique to the genre Conceiving new
arrangements of pitches combining instruments in new ways and inventing new
types of sounds are novel but speeding up or slowing down existing sounds is not
Including tempo as a feature could actually add noise to the model since many genres
overlap in their tempos And finally tempo is measured indirectly when the pitch
and timbre features are normalized for each song everything is measured in units of
ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features
each second Time signature can be dismissed from the candidates features for the
same reason as tempo many genres contain the same time signature and including
it in the feature set would only add more noise beats_start looks like a more
promising feature since like segments_pitches and segments_timbre it consists of
a vector of values However difficulties arise when we begin to think how exactly
19
we can utilize this information Since each song varies in length we need a way to
compare songs of different durations on the same level One approach could be to
perform basic statistics on the distance between each beat for example calculating
the mean and standard deviation of this distance However the normalized pitch
and timbre information already capture this data Another possibility is detecting
certain patterns of beats which could differentiate the syncopated dubstep or glitch
music beats from the steady pulse of electro-house But once again every beat is
accompanied by a sound with a specific timbre and pitch so this feature would not
add any significantly new information
23 Collecting Data and Preprocessing Selected Fea-
tures
231 Collecting the Data
Upon deciding the features I wanted to use in my research I first needed to collect
all of the electronic songs in the Million Song Dataset The easiest reliable way to
achieve this was to iterate through each song in the database and save the information
for the songs where any of the artist genre tags in artist_mbtags matched with an
electronic music genre While this measure was not fully accurate because it looks at
the genre of the artist not the song specific genre information for each song was not
as easily accessible so this indicator was nearly as good a substitute To generate a
list of the genres that electronic songs would fall under I manually searched through
a subset of the MSD to find all genres that seemed to be releated to electronic music
In the case of genres that were sometimes but not always electronic in nature such
20
as disco or pop I erred on the side of caution and did not include them in the list
of electronic genres In these cases false positives such as primarily rock songs that
happen to have the disco label attached to the artist could inadvertantly be included
in the dataset The final list of genres is as follows
target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo
rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo
rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo
rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
rsquodance and electronicarsquorsquoelectronicrsquo]
232 Pitch Preprocessing
A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-
cally informed manner The study first takes the raw sound data and converts it into
a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest
amount Then it computes the most likely chord by comparing comparing the 4 most
common types of chords in popular music (major minor dominant 7 and minor 7) to
the observed chord The most common chords are represented as ldquotemplate chordsrdquo
and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For
example using the note C as the first index the C major chord is represented as
CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)
For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is
computed over every template chord
ρCTc =12sumi=1
(CT minus CTi)(ci minus c)σCTσc
21
where CT is the mean of the values in the template chord σCT is the standard
deviation of the values in the chord and the operations on c are analogous Note
that the summation is over each individual pitch in the 12 pitch classes The chord
template with the highest value of ρ is selected as the chord for the time frame
After this is performed for each time frame the values are smoothed and then the
change between adjacent chords is observed The reasoning behind this step is that
by measuring the relative distance between chords rather than the chords themselves
all songs can be compared in the same manner even though they may have different
key signatures Finally the study takes the types of chord changes and classifies
them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted
versions of the chord changes that make more sense to a human such as ldquochanges
involving dominant 7th chordsrdquo
In my preliminary implementation of this method on an electronic dance music
corpus I made a few modifications to Mauchrsquos study First I smoothed out time
frames before computing the most probable chords rather than smoothing the most
probable chords I did this to save time and to reduce volatility in the chord
measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference
which contains 935 time frames and lasts 212 seconds 5 time frames is slightly
under 1 second and for preliminary testing appeared to be a good interval for each
time block Second as mentioned in the literature section I did not abstract the
chord changes into H-topics This decision also stemmed from time constraints since
deriving semantic chord meaning from EDM songs would require careful research
into the types of harmonies and sounds common in that genre of music Below I
included a high-level visualization of the pitch metadata found in a sample song
ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change
vector that I could then feed into the Dirichlet Process algorithm
22
Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813
CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513
DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913
FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713
GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213
AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513
13 13 13 13
060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013
13 13 13 13 13 13 13 13 13 13 13 13
13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13
Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13
Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13
Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13
Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13
23
13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13
13
13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13
Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13
24
A final step I took to normalize the chord change data was to divide the numbers by
the length of the song so that each songrsquos number of chord changes was measured per
second
233 Timbre Preprocessing
For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre
uniformly across all songs [8] After collecting all song metadata I took a random
sample of 20 songs from each year starting at 1970 The reason I forced the sampling
to 20 randomly sampled songs from each year and did not take a random sample of
songs from all years at once was to prevent bias towards any type of sounds As seen
in figure 22 there are significantly more songs from 2000-2011 than before 2000 The
mean year is x = 2001052 the median year is 2003 and the standard deviation of the
years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include
a disproportionate amount of more recent songs In order to not miss out on sounds
that may be more prevalent in older songs I required a set number of songs from each
year Next from each randomly selected song I selected 20 random timbre frames
in order to prevent any biases in data collection within each song In total there
were 422020 = 16800 timbre frames collected Next I clustered the timbre frames
using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to
100 and selecting the number of clusters with the lowest Bayes Information Criterion
(BIC) a statistical measure commonly used to calculate the best fitting model The
BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters
and saved the mean values of each of the 12 timbre segments for each cluster formed
In the same way that every song had the same 192 cord changes whose frequencies
could be compared between songs each song now had the same 46 timbre clusters
but different frequencies in each song When reading in the metadata from each song
I calculated the most likely timbre cluster each timbre frame belonged to and kept
25
Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear
a frequency count of all of the possible timbre clusters observed in a song Finally
as with the pitch data I divided all observed counts by the duration of the song in
order to normalize each songrsquos timbre counts
26
Chapter 3
Results
31 Methodology
After the pitch and timbre data was processed I ran the Dirichlet Process on the
data For each song I concatenated the 192-element chord change frequency list
and the 46-element timbre category frequency list giving each song a total of 238
features However there is a problem with this setup The pitch data will inherently
dominate the clustering process since it contains almost 3 times as many features
as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to
give different weights to each feature I considered another possibility to remedy
this discrepancy duplicating the timbre vector a certain number of times and
concatenating that to the feature set of each song While this strategy runs the risk
of corrupting the feature set and turning it into something that does not accurately
represent each song it is important to keep in mind that even without duplicating
the timbre vector the feature set consists of two separate feature sets concatenated
to each other Therefore timbre duplication appears to be a reasonable strategy to
weigh pitch and timbre more evenly
27
After this modification I tweaked a few more parameters before obtaining my
final results Dividing the pitch and timbre frequencies by the duration of the song
normalized every song to frequency per second but it also had the undesired effect
of making the data too small Timbre and pitch frequencies per second were almost
always less than 10 and many times hovered as low as 0002 for nonzero values
Because all of the values were very close to each other using common values of α
in the range of 01 to 1000-2000 was insufficient to push the songs into different
clusters As a result every song fell into the same cluster Increasing the value
of α by several orders of magnitude to well over 10 million fixed the problem but
this solution presented two problems First tuning α to experiment with different
ways to cluster the music would be problematic since I would have to work with
an enormous range of possible values for α Second pushing α to such high values
is not appropriate for the Dirichlet Process Extremely high values of α indicate a
Dirichlet Process that will try to disperse the data into different clusters but a value
of α that high is in principle always assigning each new song to a new cluster On
the other hand varying α between 01 and 1000 for example presents a much wider
range of flexibility when assigning clusters While this may be possible by varying
the values of α an extreme amount with the data as it currently is we are using
the Dirichlet Process in a way it should mathematically not be used Therefore
multiplying all of the data by a constant value so that we can work in the appropriate
range of α is the ideal approach After some experimentation I found that k=10 was
an appropriate scaling factor After initial runs of the Dirichlet Process I found out
that there was a slight issue with some of the earlier songs Since I had only artist
genre tags not specific song tags for each song I chose songs based on whether any
of the tags associated with the artist fell under any electronic music genre including
the generic term rsquoelectronicrsquo There were some bands mostly older ones from the
1960s and 1970s like Electric Light Orchestra which had some electronic music but
28
mostly featured rock funk disco or another genre Given that these artists featured
mostly non-electronic songs I decided to exclude them from my study and generate
a blacklist indicating these music artists While it was infeasible to look through
every single song and determine whether it was electronic or not I was able to look
over the earliest songs in each cluster These songs were the most important to verify
as electronic because early non-electronic songs could end up forming new clusters
and inadvertently create clusters with non-electronic sounds that I was not looking for
The goal of this thesis is to identify different groups in which EM songs are
clustered and identify the most unique artists and genres While the second task is
very simple because it requires looking at the earliest songs in each cluster the first
is difficult to gauge the effectiveness of While I can look at the average chord change
and timbre category frequencies in each category as well as other metadata putting
more semantic interpretations to what the music actually sounds like and determining
whether the music is clustered properly is a very subjective process For this reason
I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and
compared the clustering in each category examining similarities and differences in
the clusters formed in each scenario in the Discussion section For each value α I
set the upper limit of components or clusters allowed to 50 The ranges of α I used
resulted in 9 14 and 19 clusters formed
32 Findings
321 α=005
When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are
the distribution of years of the songs in each cluster (note that the Dirichlet Process
29
does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are
skipped)
30
Figure 31 Song year distributions for α = 005
For each value of α I also calculated the average frequency of each chord change
category and timbre category for each cluster and plotted the results The green
lines correspond to timbre and the blue lines to pitch
31
Figure 32 Timbre and pitch distributions for α = 005
A table of each cluster formed the number of songs in that cluster and descriptions
of pitch timbre and rhythmic qualities characteristic of songs in that cluster are
shown below
32
Cluster Song Count Characteristic Sounds
0 6481 Minimalist industrial space sounds dissonant chords
1 5482 Soft New Age ethereal
2 2405 Defined sounds electronic and non-electronic instru-
ments played in standard rock rhythms
3 360 Very dense and complex synths slightly darker tone
4 4550 Heavily distorted rock and synthesizer
6 2854 Faster paced 80s synth rock acid house
8 798 Aggressive beats dense house music
9 1464 Ambient house trancelike strong beats mysterious
tone
11 1597 Melancholy tones New wave rock in 80s then starting
in 90s downtempo trip-hop nu-metal
Table 31 Song cluster descriptions for α = 005
322 α=01
A total of 14 clusters were formed (16 were formed but 2 clusters contained only
one song each I listened to both of these songs and they did not sound unique so
I discarded them from the clusters) Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
33
34
Figure 33 Song year distributions for α = 01
35
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
playlists generated based on the music the user and userrsquos friends have listened to
The weekly playlist called Discover Weekly Playlist is a relatively new feature in
Spotify and is driven by music analysis algorithms created from Echo Nest Using
the Echo Nest code interface Spotify creates a ldquotaste profilerdquo for each user which
assesses attributes such as how often a user branches out to new styles of music how
closely the userrsquos music streamed follows popular Billboard music charts and so on
Spotify also looks at the artists and songs the user streamed and creates clusters
of different genres that the user likes (see figure 11) The taste profile and music
clusters can then be used to generate playlists geared to a specific user The genres
in the cluster come from a list of nearly 800 names which are derived by scraping
the Internet for trending terms in music as well as training various algorithms on a
regular basic by ldquolisteningrdquo to new songs [4][5]
Figure 11 A userrsquos taste profile generated by Spotify
4
Although Spotify and Echo Nestrsquos algorithms are very useful for mapping the land-
scape of established and emerging genres of music the methodology is limited to
pre-defined genres of music This may serve as a good starting point to compare my
final results to but my study aims to be as context-free as possible by attaching no
preconceived notions of music styles or genres instead looking at features that could
be measured in every song
While Spotifyrsquos approach to mapping music is very high-tech and based on ex-
isting genres Pandora takes a very low-tech and context-free approach to music
clustering Pandora created the Music Genome Project a multi-year undertaking
where skilled music theorists listened to a large number of songs and analyzed up to
450 characteristics in each song [6] Pandorarsquos approach is appealing to the aim of
my study since it does not take any preconceived notions of what a genre of music
is instead comparing songs on common characteristics such as pitch rhythm and
instrument patterns Unfortunately I do not have a cadre of skilled music theorists
at my disposal nor do I have 10 years to perform such calculations like the dedicated
workers at Pandora (tips the indestructible fedora) Additionally Pandorarsquos Music
Genome Project is intellectual property so at best I can only rely on the abstract
concepts of the Music Genome Project to drive my study
In the academic realm there are no existing studies analyzing quantifiable changes in
EM specifically but there exist a few studies that perform such analysis on popular
Western music in general One such study is Measuring the Evolution of Contem-
porary Western Popular Music which analyzes music from 1955-2010 spanning all
common genres Using the Million Song Dataset a free public database of songs
each containing metadata (see section 13) the study focuses on the attributes pitch
timbre and loudness Pitch is defined as the standard musical notes or frequency of
5
the sound waves Timbre is formally defined as the Mel frequency cepstral coefficients
(MFCC) of a transformed sound signal More informally it refers to the sound color
texture or tone quality and is associated with instrument types recording resources
and production techniques In other words two sounds that have the same pitch
but different tones (for example a bell and voice) are differentiated by their timbres
There are 12 MFCCs that define the timbre of a given sound Finally loudness
refers to intrinsically how loud the music sounds not loudness that a listener can
manipulate while listening to the music Loudness is the first MFCC of the timbre
of a sound [7] The study concluded that over time music has been becoming louder
and less diverse
The restriction of pitch sequences (with metrics showing less variety inpitch progressions) the homogenization of the timbral palette (with fre-quent timbres becoming more frequent) and growing average loudnesslevels (threatening a dynamic richness that has been conserved until to-day) This suggests that our perception of the new would be essentiallyrooted on identifying simpler pitch sequences fashionable timbral mix-tures and louder volumes Hence an old tune with slightly simpler chordprogressions new instrument sonorities that were in agreement with cur-rent tendencies and recorded with modern techniques that allowed forincreased loudness levels could be easily perceived as novel fashionableand groundbreaking
This study serves as a good starting point for mathematically analyzing music in
a few ways First it utilizes the Million Song Dataset which addresses the issue
of legally obtaining music metadata As mentioned in section 13 the only legal
way to obtain playable music for this study would have been to purchase all songs I
would include which is infeasible While the Million Song Dataset does not contain
the audio files in playable format it does contain audio features and metadata that
allow for in-depth analysis In addition working with the dataset takes out the
work of extracting features from raw audio files saving an extensive amount of time
and energy Second the study establishes specifics for what constitutes a trend
in music Pitch timbre and loudness are core features of music and examining the6
distributions of each among songs over time reveals a lot of information about how
the music industry and consumersrsquo tastes have evolved While these are not all of the
features contained in a song they serve as a good starting point Third the study
defines mathematical ways to capture music attributes and measure their change
over time For example pitches are transposed into the same tonal context with
binary discretized pitch descriptions based on a threshold so that each song can be
represented with vectors of pitches that are normalized and compared to other songs
While this study lays some solid groundwork for capturing and analyzing nu-
meric qualities of music it falls short of addressing my goals in a couple of ways
First it does not perform any analysis with respect to music genre While the
analysis performed in this paper could easily be applied to a list of songs in a specific
genre certain genres might have unique sounds and rhythms relative to other genres
that would be worth studying in greater detail Second the study only measures
general trends in music over time The models used to describe changes are simple
regressions that donrsquot look at more nuanced changes For example what styles of
music developed over certain periods of time How rapid were those changes Which
styles of music developed from which other styles
A more promising study led by music researcher Matthias Mauch [8] analyzes
contemporary popular Western Music from the 1960s to 2010s by comparing numer-
ical data on the pitches and timbre of a corpus of 17000 songs that appeared on the
Billboard Hot 100 Like the previously mentioned paper Measuring the Evolution
of Contemporary Western Popular Music Mauchrsquos study also creates abstractions
of pitch and timbre in order to provide a consistent and meaningful semantic inter-
pretation of musical data (see figure 12) However Mauchrsquos study takes this idea a
step further by using genre tags from Lastfm a music website and constructing a
7
hierarchy of music genres using hierarchical clustering Additionally the study takes
a crack at determining whether a particular band the Beatles was musically ground-
breaking for its time or merely playing off sounds that other bands had already used
Figure 12 Data processing pipeline for Mauchrsquos study illustrated with a segment ofQueenrsquos Bohemian Rhapsody 1975
While both Measuring the Evolution of Contemporary Western Popular Music
and Mauchrsquos study created abstractions of pitch and timbre Mauchrsquos study is more
appealing with respect to my goal because its end results align more closely with
mine Additionally the data processing pipeline offers several layers of abstraction
8
and depending on my progress I would be able to achieve at least one of the levels of
abstraction As shown in figure 12 each segment of a raw audio file is first broken
down into its 12 timbre MFCCs and pitch components Next the study constructs
ldquolexiconsrdquo or a dictionary of pitch and timbre terms that all songs can be compared
to For pitch the original data is in a N-by-12 matrix where N is the number of time
segments in the song and 12 the number of each of the notes found in an octave of
pitches Each time segment contains the relative strengths of each of the 12 pitches
However music sounds are not merely a collection of pitches but more precisely
chords Furthermore the similarity of two songs is not determined by the absolute
pitches of their chords but rather the progression of chords in the song all relative to
each other For example if all the notes in a song are transposed by one step the song
will sound different in terms of absolute pitch but the song will still be recognized
as the original because all of the relative movements from each chord to the next
are the same This phenomenon is captured in the pitch data by finding the most
likely chord played at each time segment then counting the change to the next chord
at each time step and generating a table of chord change frequencies for each song
Constructing the timbre lexcion is more complicated since there is no easy analogue
like chords for pitches to compare songs Mauchrsquos study utilizes a Gaussian Mixture
Model (GMM) by iterating over k=1 to k=N clusters where N is a large number
running the GMM on each prior assumption of k clusters and computing the Bayes
Information Criterion (BIC) for each model The lowest of the N BIC values is found
and that value of k is selected That model contains k different timbre clusters
and each cluster contains the mean timbre value for each of the 12 timbre components
For my research I decided that the pitch and timbre lexicons would be the most
realistic level of abstraction I could obtain Mauchrsquos study adds an addtional layer
to pitch and timbre by identifying the most common patterns of chord changes and
9
most common timbre rhythms and creating more general tags from these combined
terms such as ldquo stepwise changes indicating modal harmonyrdquo for a pitch topic and
ldquooh rounded mellowrdquo for a timbral topic There were two problems with using this
final layer of abstraction for my study First attaching semantic interpretations to
the pitch and timbral lexicons is a difficult task For timbre I would need to listen
to sound samples containing all of the different timbral categories I identified and
attaching user interpretations to them For the chords not only would I have to
perform the same analysis as on timbre but take careful attention to identify which
chords correspond to common sound progressions in popular music a task that I am
not qualified for an did not have the resources for this thesis to seek out Second
this final layer of abstraction was not necessary for the end goal of my paper In
fact consolidating my pitch and timbre lexicons into simpler phrases would run the
risk of pigeonholing my analysis and preventing me from discovering more nuanced
patterns in my final results Therefore I decided to focus on pitch and timbral
lexicon construction as the furthest levels of abstraction when processing songs for
my thesis Mathematical details on how I constructed the lexical and timbral lexicons
can be found in the Mathematical Modeling section of this paper
13 The Dataset
In order to successfully execute my thesis I need access to an extensive database of
music Until recently acquiring a substantial corpus of music data was a difficult and
costly task It is illegal to download music audio files from video and music-sharing
sites such as YouTube Spotify and Pandora Some platforms such as iTunes offer
90-second previews of songs but using only segments of songs and usually segments
that showcase the chorus of the song are not reliable measures to capture the entire
essence of a song Even if I were to legally download entire audio files for free I would
10
run into additional issues Obtaining a high-quality corpora of song data would be
challenging writing scripts that crawl music sharing platforms may not capture all of
the music I am looking for And once I have the audio files I would have to perform
audio processing techniques to extract the relevant information from the songs
Fortunately there is an easy solution to the music data acquisition problem
The Million Song Dataset (MSD) is a collection of metadata for one million music
tracks dating up to 2011 Various organizations such as The Echo Nest Musicbrainz
7digital and Lastfm have contributed different pieces of metadata Each song is
represented as a Hierarchical Data Format file (HDF5) which can be loaded as a
JSON object The fields encompass topical features such as the song title artist
and release date as well as lower-level features such as the loudness starting beat
time pitches and timbre of several segments of the song [9] While the MSD is
the largest free and open source music metadata dataset I could find there is no
guarantee that it adequately covers the entire spectrum of EM artists and songs
This quality limitation is important to consider throughout the study A quick look
through the songs including the subset of data I worked with for this report showed
that there were several well-known artists and songs in the EM scene Therefore
while the MSD may not contain all desired songs for this project it contains an
adequate number of relevant songs to produce some meaningful results Additionally
laying the groundwork for modeling the similarities between songs and identifying
groundbreaking ones is the same regardless of the songs included and the following
methodologies can be implemented on any similarly-formatted dataset including one
with songs that might currently be missing in the MSD
11
Chapter 2
Mathematical Modeling
21 Determining Novelty of Songs
Finding an logical and implementable mathematical model was and continues to be
an important aspect of my research My problem how to mathematically determine
which songs were unique for their time requires an algorithm in which each song is
introduced in chronological order either joining an existing category or starting a
new category based on its musical similarity to songs already introduced Clustering
algorithms like k-means or Gaussian Mixture Models (GMM) which have a prede-
termined number of clusters and optimize the partitioning of a dataset into those
cluster assume a fixed number of clusters While this process would work if we knew
exactly how many genres of EM existed if we guess wrong our end results may end
up with clusters that are wrongly grouped together or separated It is much better to
apply a clustering algorithm that does not make any assumptions about this number
One particularly promising process that addresses the issue of tthe number of
clusters is a family of algorithms known as Dirichlet Proccesses (DPs) DPs are
useful for this particular application because (1) they assign clusters to a dataset
12
with only an upper bound on the number of clusters and (2) by sorting the songs
in chronological order before running the algorithm and keeping track of which
songs are categorized under each cluster we can observe the earliest songs in each
cluster and consequentially infer which songs were responsible for creating new
clusters The arguments for the DP The DP is controlled by a parameter α which
is the concentration parameter The expected number of clusters formed is directly
proportional to the value of α so the higher the value of α the more likely new
clusters will be formed [10] Regardless of the value of α as the number of data
points introduced increases the probability of a new group being formed decreases
That is a ldquorich get richerrdquo policy is in place and existing clusters tend to grow in
size Tweaking the value of the tunable parameter α is an important part of the
study since it determines the flexibility given to forming a new cluster If the value
of α is too small then the criteria for forming clusters will be too strict and data
that should be in different clusters will be assigned to the same cluster On the other
hand if α is too large the algorithm will be too sensitive and assign similar songs to
different clusters
The implementation of the DP was achieved using scikit-learnrsquos library and API for
Dirichlet Process Gaussian Mixture Model (DPGMM) The DPGMM is the formal
name of the Dirichlet Process model used to cluster the data More specifically
scikit-learnrsquos implementation of the DPGMM uses the Stick Breaking method
one of several equally valid methods to assign songs to clusters [11] While the
mathematical details for this algorithm can be found at the following citation [12]
the most important aspects of the DPGMM are the arguments that the user can
specify and tune The first of these tunable parameters is the value α which is the
same parameter as the α discussed in the previous paragraph As seen in Figure 21
on the right side properly tuning α is key to obtaining meaningful clusters The
13
center image has α set to 001 which is too small and results in all of the data being
formed under one cluster On the other hand the bottom-right image has the same
data set and α set to 100 which does a better job of clustering On a related note
the figure also demonstrates the effectiveness of the DPGMM over the GMM On the
left side clearly the dataset contains 2 clusters but the GMM on the top-left image
assumes 5 clusters as a prior and consequentially clusters the data incorrectly while
the DPGMM manages to limit the data to 2 clusters
The second argument that the user inputs for the DPGMM is the data that
will be clustered The scikit-learn implementation takes the data in the format
of a nested list (N lists each of length m) where N is the number of data points
and m the number of features While the format of the data structure is relatively
straightforward choosing which numbers should be in the data was a challenge I
faced Selecting the relevant features of each song to be used in the algorithm will
be expounded upon in the next section ldquoFeature Selectionrdquo
The last argument that a user inputs for the scikit-learn DGPMM implementa-
tion is an argument indicating the upper bound for the number of clusters The
Dirichlet Process then determines the best number of clusters for the data between
1 and the upper bound Since the DPGMM is flexible enough to find the best value
I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to
modify the number of clusters formed
22 Feature Selection
One of the most difficult aspects of the Dirichlet method is choosing the features
to be used for clustering In other words when we organize the songs into clusters
14
Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α
we need to ensure that each cluster is distinct in a way that is statistically and
intuitively logical In the Million Song Dataset [9] each song is represented as a
JSON object containing several fields These fields are candidate features to be used
in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo
by Rick Astley and the corresponding features
artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID
for this artists is db9)
artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)
artist_mbtags_count shape = (4)
(raw tag count of the 4 tags this artist received on musicbrainzorg)
artist_name Rick Astley (artist name)
artist_playmeid 1338 (the ID of that artist on the service playmecom)
artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)
artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest
(number between 0 and 1))
artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest
(number between 0 and 1))
15
audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for
the analysis by The Echo Nest)
bars_confidence shape = (99) (confidence value (between 0 and 1) associated
with each bar by The Echo Nest)
bars_start shape = (99) (start time of each bar according to The Echo Nest this
song has 99 bars)
beats_confidence shape = (397))
confidence value (between 0 and 1) associated with each beat by The Echo Nest
beats_start shape = (397) (start time of each beat according to The Echo Nest
this song has 397 beats)
danceability 00 (danceability measure of this song according to The Echo Nest
(between 0 and 1 0 =gt not analyzed))
duration 21169587 (duration of the track in seconds)
end_of_fade_in 0139 (time of the end of the fade in at the beginning of the
song according to The Echo Nest)
energy 00 (energy measure (not in the signal processing sense) according to The
Echo Nest (between 0 and 1 0 = not analyzed))
key 1 (estimation of the key the song is in by The Echo Nest)
key_confidence 0324 (confidence of the key estimation)
loudness -775 (general loudness of the track)
mode 1 (estimation of the mode the song is in by The Echo Nest)
mode_confidence 0434 (confidence of the mode estimation)
release Big Tunes - Back 2 The 80s (album name from which the track was taken
some songs tracks can come from many albums we give only one)
release_7digitalid 786795 (the ID of the release (album) on the service 7digi-
talcom)
sections_confidence shape = (10) (confidence value (between 0 and 1) associated
16
with each section by The Echo Nest)
sections_start shape = (10) (start time of each section according to The Echo
Nest this song has 10 sections)
segments_confidence shape = (935) (confidence value (between 0 and 1) asso-
ciated with each segment by The Echo Nest)
segments_loudness_max shape = (935) (max loudness during each segment)
segments_loudness_max_time shape = (935) (time of the max loudness
during each segment)
segments_loudness_start shape = (935) (loudness at the beginning of each
segment)
segments_pitches shape = (935 12) (chroma features for each segment (normal-
ized so max is 1))
segments_start shape = (935) (start time of each segment ( musical event or
onset) according to The Echo Nest this song has 935 segments)
segments_timbre shape = (935 12) (MFCC-like features for each segment)
similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar
to Rick Astley according to The Echo Nest)
song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded
(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))
song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can
be associated with many tracks (with very slight audio differences))
start_of _fade _out 198536 (start time of the fade out in seconds at the end
of the song according to The Echo Nest)
tatums_confidence shape = (794) (confidence value (between 0 and 1) associated
with each tatum by The Echo Nest)
tatums_start shape = (794) (start time of each tatum according to The Echo
Nest this song has 794 tatums)
17
tempo 113359 (tempo in BPM according to The Echo Nest)
time_signature 4 (time signature of the song according to The Echo Nest ie
usual number of beats per bar)
time_signature_confidence 0634 (confidence of the time signature estimation)
title Never Gonna Give You Up (song title)
track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)
track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track
on which the analysis was done) year 1987 (year when this song was released
according to musicbrainzorg)
When choosing features my main goal was to use features that would most
likely yield meaningful results yet also be simple and make sense to the average
person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener
will have his or her opinions to what constitutes different types of music but some
common features most people tend to differentiate songs by are pitch rhythm and
the types of instruments used The following specific fields provided in each song
object fall under these three terms
Pitch
bull segments_pitches a matrix of values indicating the strength of each pitch (or
note) at each discernible time interval
Rhythm
bull beats_start a vector of values indicating the start time of each beat
bull time_signature the time signature of the song
bull tempo the speed of the song in Beats Per Minute (BPM)
Instruments
18
bull segments_timbre a matrix of values indicating the distribution of MFCC-like
features (different types of tones) for each segments
The segments_pitches feature is a clear candidate for a differentiating factor for
songs since it reveals patterns of notes that occur Additionally other research
papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a
procedure that allows all songs to be compared with the same metric Likewise
timbre is intuitively a reliable differentiating feature since it reveals the amount
that different tones or sounds that sound different despite having the same pitch
Therefore segments_timbre is another feature that is considered in each song
Finally we look at the candidate features for rhythm At first glance all of these
features appear to be useful as they indicate the rhythm of a song in one way or
another However none of these features are as useful as the pitch and timbre
features While tempo is one factor in differentiating genres of EDM and music in
general tempo alone is not a driving force of musical innovation Certain genres
of EDM like drum nrsquo bass and happycore stand out for having very fast tempos
but the tempo is supplemented with a sound unique to the genre Conceiving new
arrangements of pitches combining instruments in new ways and inventing new
types of sounds are novel but speeding up or slowing down existing sounds is not
Including tempo as a feature could actually add noise to the model since many genres
overlap in their tempos And finally tempo is measured indirectly when the pitch
and timbre features are normalized for each song everything is measured in units of
ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features
each second Time signature can be dismissed from the candidates features for the
same reason as tempo many genres contain the same time signature and including
it in the feature set would only add more noise beats_start looks like a more
promising feature since like segments_pitches and segments_timbre it consists of
a vector of values However difficulties arise when we begin to think how exactly
19
we can utilize this information Since each song varies in length we need a way to
compare songs of different durations on the same level One approach could be to
perform basic statistics on the distance between each beat for example calculating
the mean and standard deviation of this distance However the normalized pitch
and timbre information already capture this data Another possibility is detecting
certain patterns of beats which could differentiate the syncopated dubstep or glitch
music beats from the steady pulse of electro-house But once again every beat is
accompanied by a sound with a specific timbre and pitch so this feature would not
add any significantly new information
23 Collecting Data and Preprocessing Selected Fea-
tures
231 Collecting the Data
Upon deciding the features I wanted to use in my research I first needed to collect
all of the electronic songs in the Million Song Dataset The easiest reliable way to
achieve this was to iterate through each song in the database and save the information
for the songs where any of the artist genre tags in artist_mbtags matched with an
electronic music genre While this measure was not fully accurate because it looks at
the genre of the artist not the song specific genre information for each song was not
as easily accessible so this indicator was nearly as good a substitute To generate a
list of the genres that electronic songs would fall under I manually searched through
a subset of the MSD to find all genres that seemed to be releated to electronic music
In the case of genres that were sometimes but not always electronic in nature such
20
as disco or pop I erred on the side of caution and did not include them in the list
of electronic genres In these cases false positives such as primarily rock songs that
happen to have the disco label attached to the artist could inadvertantly be included
in the dataset The final list of genres is as follows
target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo
rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo
rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo
rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
rsquodance and electronicarsquorsquoelectronicrsquo]
232 Pitch Preprocessing
A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-
cally informed manner The study first takes the raw sound data and converts it into
a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest
amount Then it computes the most likely chord by comparing comparing the 4 most
common types of chords in popular music (major minor dominant 7 and minor 7) to
the observed chord The most common chords are represented as ldquotemplate chordsrdquo
and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For
example using the note C as the first index the C major chord is represented as
CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)
For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is
computed over every template chord
ρCTc =12sumi=1
(CT minus CTi)(ci minus c)σCTσc
21
where CT is the mean of the values in the template chord σCT is the standard
deviation of the values in the chord and the operations on c are analogous Note
that the summation is over each individual pitch in the 12 pitch classes The chord
template with the highest value of ρ is selected as the chord for the time frame
After this is performed for each time frame the values are smoothed and then the
change between adjacent chords is observed The reasoning behind this step is that
by measuring the relative distance between chords rather than the chords themselves
all songs can be compared in the same manner even though they may have different
key signatures Finally the study takes the types of chord changes and classifies
them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted
versions of the chord changes that make more sense to a human such as ldquochanges
involving dominant 7th chordsrdquo
In my preliminary implementation of this method on an electronic dance music
corpus I made a few modifications to Mauchrsquos study First I smoothed out time
frames before computing the most probable chords rather than smoothing the most
probable chords I did this to save time and to reduce volatility in the chord
measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference
which contains 935 time frames and lasts 212 seconds 5 time frames is slightly
under 1 second and for preliminary testing appeared to be a good interval for each
time block Second as mentioned in the literature section I did not abstract the
chord changes into H-topics This decision also stemmed from time constraints since
deriving semantic chord meaning from EDM songs would require careful research
into the types of harmonies and sounds common in that genre of music Below I
included a high-level visualization of the pitch metadata found in a sample song
ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change
vector that I could then feed into the Dirichlet Process algorithm
22
Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813
CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513
DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913
FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713
GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213
AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513
13 13 13 13
060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013
13 13 13 13 13 13 13 13 13 13 13 13
13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13
Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13
Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13
Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13
Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13
23
13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13
13
13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13
Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13
24
A final step I took to normalize the chord change data was to divide the numbers by
the length of the song so that each songrsquos number of chord changes was measured per
second
233 Timbre Preprocessing
For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre
uniformly across all songs [8] After collecting all song metadata I took a random
sample of 20 songs from each year starting at 1970 The reason I forced the sampling
to 20 randomly sampled songs from each year and did not take a random sample of
songs from all years at once was to prevent bias towards any type of sounds As seen
in figure 22 there are significantly more songs from 2000-2011 than before 2000 The
mean year is x = 2001052 the median year is 2003 and the standard deviation of the
years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include
a disproportionate amount of more recent songs In order to not miss out on sounds
that may be more prevalent in older songs I required a set number of songs from each
year Next from each randomly selected song I selected 20 random timbre frames
in order to prevent any biases in data collection within each song In total there
were 422020 = 16800 timbre frames collected Next I clustered the timbre frames
using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to
100 and selecting the number of clusters with the lowest Bayes Information Criterion
(BIC) a statistical measure commonly used to calculate the best fitting model The
BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters
and saved the mean values of each of the 12 timbre segments for each cluster formed
In the same way that every song had the same 192 cord changes whose frequencies
could be compared between songs each song now had the same 46 timbre clusters
but different frequencies in each song When reading in the metadata from each song
I calculated the most likely timbre cluster each timbre frame belonged to and kept
25
Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear
a frequency count of all of the possible timbre clusters observed in a song Finally
as with the pitch data I divided all observed counts by the duration of the song in
order to normalize each songrsquos timbre counts
26
Chapter 3
Results
31 Methodology
After the pitch and timbre data was processed I ran the Dirichlet Process on the
data For each song I concatenated the 192-element chord change frequency list
and the 46-element timbre category frequency list giving each song a total of 238
features However there is a problem with this setup The pitch data will inherently
dominate the clustering process since it contains almost 3 times as many features
as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to
give different weights to each feature I considered another possibility to remedy
this discrepancy duplicating the timbre vector a certain number of times and
concatenating that to the feature set of each song While this strategy runs the risk
of corrupting the feature set and turning it into something that does not accurately
represent each song it is important to keep in mind that even without duplicating
the timbre vector the feature set consists of two separate feature sets concatenated
to each other Therefore timbre duplication appears to be a reasonable strategy to
weigh pitch and timbre more evenly
27
After this modification I tweaked a few more parameters before obtaining my
final results Dividing the pitch and timbre frequencies by the duration of the song
normalized every song to frequency per second but it also had the undesired effect
of making the data too small Timbre and pitch frequencies per second were almost
always less than 10 and many times hovered as low as 0002 for nonzero values
Because all of the values were very close to each other using common values of α
in the range of 01 to 1000-2000 was insufficient to push the songs into different
clusters As a result every song fell into the same cluster Increasing the value
of α by several orders of magnitude to well over 10 million fixed the problem but
this solution presented two problems First tuning α to experiment with different
ways to cluster the music would be problematic since I would have to work with
an enormous range of possible values for α Second pushing α to such high values
is not appropriate for the Dirichlet Process Extremely high values of α indicate a
Dirichlet Process that will try to disperse the data into different clusters but a value
of α that high is in principle always assigning each new song to a new cluster On
the other hand varying α between 01 and 1000 for example presents a much wider
range of flexibility when assigning clusters While this may be possible by varying
the values of α an extreme amount with the data as it currently is we are using
the Dirichlet Process in a way it should mathematically not be used Therefore
multiplying all of the data by a constant value so that we can work in the appropriate
range of α is the ideal approach After some experimentation I found that k=10 was
an appropriate scaling factor After initial runs of the Dirichlet Process I found out
that there was a slight issue with some of the earlier songs Since I had only artist
genre tags not specific song tags for each song I chose songs based on whether any
of the tags associated with the artist fell under any electronic music genre including
the generic term rsquoelectronicrsquo There were some bands mostly older ones from the
1960s and 1970s like Electric Light Orchestra which had some electronic music but
28
mostly featured rock funk disco or another genre Given that these artists featured
mostly non-electronic songs I decided to exclude them from my study and generate
a blacklist indicating these music artists While it was infeasible to look through
every single song and determine whether it was electronic or not I was able to look
over the earliest songs in each cluster These songs were the most important to verify
as electronic because early non-electronic songs could end up forming new clusters
and inadvertently create clusters with non-electronic sounds that I was not looking for
The goal of this thesis is to identify different groups in which EM songs are
clustered and identify the most unique artists and genres While the second task is
very simple because it requires looking at the earliest songs in each cluster the first
is difficult to gauge the effectiveness of While I can look at the average chord change
and timbre category frequencies in each category as well as other metadata putting
more semantic interpretations to what the music actually sounds like and determining
whether the music is clustered properly is a very subjective process For this reason
I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and
compared the clustering in each category examining similarities and differences in
the clusters formed in each scenario in the Discussion section For each value α I
set the upper limit of components or clusters allowed to 50 The ranges of α I used
resulted in 9 14 and 19 clusters formed
32 Findings
321 α=005
When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are
the distribution of years of the songs in each cluster (note that the Dirichlet Process
29
does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are
skipped)
30
Figure 31 Song year distributions for α = 005
For each value of α I also calculated the average frequency of each chord change
category and timbre category for each cluster and plotted the results The green
lines correspond to timbre and the blue lines to pitch
31
Figure 32 Timbre and pitch distributions for α = 005
A table of each cluster formed the number of songs in that cluster and descriptions
of pitch timbre and rhythmic qualities characteristic of songs in that cluster are
shown below
32
Cluster Song Count Characteristic Sounds
0 6481 Minimalist industrial space sounds dissonant chords
1 5482 Soft New Age ethereal
2 2405 Defined sounds electronic and non-electronic instru-
ments played in standard rock rhythms
3 360 Very dense and complex synths slightly darker tone
4 4550 Heavily distorted rock and synthesizer
6 2854 Faster paced 80s synth rock acid house
8 798 Aggressive beats dense house music
9 1464 Ambient house trancelike strong beats mysterious
tone
11 1597 Melancholy tones New wave rock in 80s then starting
in 90s downtempo trip-hop nu-metal
Table 31 Song cluster descriptions for α = 005
322 α=01
A total of 14 clusters were formed (16 were formed but 2 clusters contained only
one song each I listened to both of these songs and they did not sound unique so
I discarded them from the clusters) Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
33
34
Figure 33 Song year distributions for α = 01
35
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
Although Spotify and Echo Nestrsquos algorithms are very useful for mapping the land-
scape of established and emerging genres of music the methodology is limited to
pre-defined genres of music This may serve as a good starting point to compare my
final results to but my study aims to be as context-free as possible by attaching no
preconceived notions of music styles or genres instead looking at features that could
be measured in every song
While Spotifyrsquos approach to mapping music is very high-tech and based on ex-
isting genres Pandora takes a very low-tech and context-free approach to music
clustering Pandora created the Music Genome Project a multi-year undertaking
where skilled music theorists listened to a large number of songs and analyzed up to
450 characteristics in each song [6] Pandorarsquos approach is appealing to the aim of
my study since it does not take any preconceived notions of what a genre of music
is instead comparing songs on common characteristics such as pitch rhythm and
instrument patterns Unfortunately I do not have a cadre of skilled music theorists
at my disposal nor do I have 10 years to perform such calculations like the dedicated
workers at Pandora (tips the indestructible fedora) Additionally Pandorarsquos Music
Genome Project is intellectual property so at best I can only rely on the abstract
concepts of the Music Genome Project to drive my study
In the academic realm there are no existing studies analyzing quantifiable changes in
EM specifically but there exist a few studies that perform such analysis on popular
Western music in general One such study is Measuring the Evolution of Contem-
porary Western Popular Music which analyzes music from 1955-2010 spanning all
common genres Using the Million Song Dataset a free public database of songs
each containing metadata (see section 13) the study focuses on the attributes pitch
timbre and loudness Pitch is defined as the standard musical notes or frequency of
5
the sound waves Timbre is formally defined as the Mel frequency cepstral coefficients
(MFCC) of a transformed sound signal More informally it refers to the sound color
texture or tone quality and is associated with instrument types recording resources
and production techniques In other words two sounds that have the same pitch
but different tones (for example a bell and voice) are differentiated by their timbres
There are 12 MFCCs that define the timbre of a given sound Finally loudness
refers to intrinsically how loud the music sounds not loudness that a listener can
manipulate while listening to the music Loudness is the first MFCC of the timbre
of a sound [7] The study concluded that over time music has been becoming louder
and less diverse
The restriction of pitch sequences (with metrics showing less variety inpitch progressions) the homogenization of the timbral palette (with fre-quent timbres becoming more frequent) and growing average loudnesslevels (threatening a dynamic richness that has been conserved until to-day) This suggests that our perception of the new would be essentiallyrooted on identifying simpler pitch sequences fashionable timbral mix-tures and louder volumes Hence an old tune with slightly simpler chordprogressions new instrument sonorities that were in agreement with cur-rent tendencies and recorded with modern techniques that allowed forincreased loudness levels could be easily perceived as novel fashionableand groundbreaking
This study serves as a good starting point for mathematically analyzing music in
a few ways First it utilizes the Million Song Dataset which addresses the issue
of legally obtaining music metadata As mentioned in section 13 the only legal
way to obtain playable music for this study would have been to purchase all songs I
would include which is infeasible While the Million Song Dataset does not contain
the audio files in playable format it does contain audio features and metadata that
allow for in-depth analysis In addition working with the dataset takes out the
work of extracting features from raw audio files saving an extensive amount of time
and energy Second the study establishes specifics for what constitutes a trend
in music Pitch timbre and loudness are core features of music and examining the6
distributions of each among songs over time reveals a lot of information about how
the music industry and consumersrsquo tastes have evolved While these are not all of the
features contained in a song they serve as a good starting point Third the study
defines mathematical ways to capture music attributes and measure their change
over time For example pitches are transposed into the same tonal context with
binary discretized pitch descriptions based on a threshold so that each song can be
represented with vectors of pitches that are normalized and compared to other songs
While this study lays some solid groundwork for capturing and analyzing nu-
meric qualities of music it falls short of addressing my goals in a couple of ways
First it does not perform any analysis with respect to music genre While the
analysis performed in this paper could easily be applied to a list of songs in a specific
genre certain genres might have unique sounds and rhythms relative to other genres
that would be worth studying in greater detail Second the study only measures
general trends in music over time The models used to describe changes are simple
regressions that donrsquot look at more nuanced changes For example what styles of
music developed over certain periods of time How rapid were those changes Which
styles of music developed from which other styles
A more promising study led by music researcher Matthias Mauch [8] analyzes
contemporary popular Western Music from the 1960s to 2010s by comparing numer-
ical data on the pitches and timbre of a corpus of 17000 songs that appeared on the
Billboard Hot 100 Like the previously mentioned paper Measuring the Evolution
of Contemporary Western Popular Music Mauchrsquos study also creates abstractions
of pitch and timbre in order to provide a consistent and meaningful semantic inter-
pretation of musical data (see figure 12) However Mauchrsquos study takes this idea a
step further by using genre tags from Lastfm a music website and constructing a
7
hierarchy of music genres using hierarchical clustering Additionally the study takes
a crack at determining whether a particular band the Beatles was musically ground-
breaking for its time or merely playing off sounds that other bands had already used
Figure 12 Data processing pipeline for Mauchrsquos study illustrated with a segment ofQueenrsquos Bohemian Rhapsody 1975
While both Measuring the Evolution of Contemporary Western Popular Music
and Mauchrsquos study created abstractions of pitch and timbre Mauchrsquos study is more
appealing with respect to my goal because its end results align more closely with
mine Additionally the data processing pipeline offers several layers of abstraction
8
and depending on my progress I would be able to achieve at least one of the levels of
abstraction As shown in figure 12 each segment of a raw audio file is first broken
down into its 12 timbre MFCCs and pitch components Next the study constructs
ldquolexiconsrdquo or a dictionary of pitch and timbre terms that all songs can be compared
to For pitch the original data is in a N-by-12 matrix where N is the number of time
segments in the song and 12 the number of each of the notes found in an octave of
pitches Each time segment contains the relative strengths of each of the 12 pitches
However music sounds are not merely a collection of pitches but more precisely
chords Furthermore the similarity of two songs is not determined by the absolute
pitches of their chords but rather the progression of chords in the song all relative to
each other For example if all the notes in a song are transposed by one step the song
will sound different in terms of absolute pitch but the song will still be recognized
as the original because all of the relative movements from each chord to the next
are the same This phenomenon is captured in the pitch data by finding the most
likely chord played at each time segment then counting the change to the next chord
at each time step and generating a table of chord change frequencies for each song
Constructing the timbre lexcion is more complicated since there is no easy analogue
like chords for pitches to compare songs Mauchrsquos study utilizes a Gaussian Mixture
Model (GMM) by iterating over k=1 to k=N clusters where N is a large number
running the GMM on each prior assumption of k clusters and computing the Bayes
Information Criterion (BIC) for each model The lowest of the N BIC values is found
and that value of k is selected That model contains k different timbre clusters
and each cluster contains the mean timbre value for each of the 12 timbre components
For my research I decided that the pitch and timbre lexicons would be the most
realistic level of abstraction I could obtain Mauchrsquos study adds an addtional layer
to pitch and timbre by identifying the most common patterns of chord changes and
9
most common timbre rhythms and creating more general tags from these combined
terms such as ldquo stepwise changes indicating modal harmonyrdquo for a pitch topic and
ldquooh rounded mellowrdquo for a timbral topic There were two problems with using this
final layer of abstraction for my study First attaching semantic interpretations to
the pitch and timbral lexicons is a difficult task For timbre I would need to listen
to sound samples containing all of the different timbral categories I identified and
attaching user interpretations to them For the chords not only would I have to
perform the same analysis as on timbre but take careful attention to identify which
chords correspond to common sound progressions in popular music a task that I am
not qualified for an did not have the resources for this thesis to seek out Second
this final layer of abstraction was not necessary for the end goal of my paper In
fact consolidating my pitch and timbre lexicons into simpler phrases would run the
risk of pigeonholing my analysis and preventing me from discovering more nuanced
patterns in my final results Therefore I decided to focus on pitch and timbral
lexicon construction as the furthest levels of abstraction when processing songs for
my thesis Mathematical details on how I constructed the lexical and timbral lexicons
can be found in the Mathematical Modeling section of this paper
13 The Dataset
In order to successfully execute my thesis I need access to an extensive database of
music Until recently acquiring a substantial corpus of music data was a difficult and
costly task It is illegal to download music audio files from video and music-sharing
sites such as YouTube Spotify and Pandora Some platforms such as iTunes offer
90-second previews of songs but using only segments of songs and usually segments
that showcase the chorus of the song are not reliable measures to capture the entire
essence of a song Even if I were to legally download entire audio files for free I would
10
run into additional issues Obtaining a high-quality corpora of song data would be
challenging writing scripts that crawl music sharing platforms may not capture all of
the music I am looking for And once I have the audio files I would have to perform
audio processing techniques to extract the relevant information from the songs
Fortunately there is an easy solution to the music data acquisition problem
The Million Song Dataset (MSD) is a collection of metadata for one million music
tracks dating up to 2011 Various organizations such as The Echo Nest Musicbrainz
7digital and Lastfm have contributed different pieces of metadata Each song is
represented as a Hierarchical Data Format file (HDF5) which can be loaded as a
JSON object The fields encompass topical features such as the song title artist
and release date as well as lower-level features such as the loudness starting beat
time pitches and timbre of several segments of the song [9] While the MSD is
the largest free and open source music metadata dataset I could find there is no
guarantee that it adequately covers the entire spectrum of EM artists and songs
This quality limitation is important to consider throughout the study A quick look
through the songs including the subset of data I worked with for this report showed
that there were several well-known artists and songs in the EM scene Therefore
while the MSD may not contain all desired songs for this project it contains an
adequate number of relevant songs to produce some meaningful results Additionally
laying the groundwork for modeling the similarities between songs and identifying
groundbreaking ones is the same regardless of the songs included and the following
methodologies can be implemented on any similarly-formatted dataset including one
with songs that might currently be missing in the MSD
11
Chapter 2
Mathematical Modeling
21 Determining Novelty of Songs
Finding an logical and implementable mathematical model was and continues to be
an important aspect of my research My problem how to mathematically determine
which songs were unique for their time requires an algorithm in which each song is
introduced in chronological order either joining an existing category or starting a
new category based on its musical similarity to songs already introduced Clustering
algorithms like k-means or Gaussian Mixture Models (GMM) which have a prede-
termined number of clusters and optimize the partitioning of a dataset into those
cluster assume a fixed number of clusters While this process would work if we knew
exactly how many genres of EM existed if we guess wrong our end results may end
up with clusters that are wrongly grouped together or separated It is much better to
apply a clustering algorithm that does not make any assumptions about this number
One particularly promising process that addresses the issue of tthe number of
clusters is a family of algorithms known as Dirichlet Proccesses (DPs) DPs are
useful for this particular application because (1) they assign clusters to a dataset
12
with only an upper bound on the number of clusters and (2) by sorting the songs
in chronological order before running the algorithm and keeping track of which
songs are categorized under each cluster we can observe the earliest songs in each
cluster and consequentially infer which songs were responsible for creating new
clusters The arguments for the DP The DP is controlled by a parameter α which
is the concentration parameter The expected number of clusters formed is directly
proportional to the value of α so the higher the value of α the more likely new
clusters will be formed [10] Regardless of the value of α as the number of data
points introduced increases the probability of a new group being formed decreases
That is a ldquorich get richerrdquo policy is in place and existing clusters tend to grow in
size Tweaking the value of the tunable parameter α is an important part of the
study since it determines the flexibility given to forming a new cluster If the value
of α is too small then the criteria for forming clusters will be too strict and data
that should be in different clusters will be assigned to the same cluster On the other
hand if α is too large the algorithm will be too sensitive and assign similar songs to
different clusters
The implementation of the DP was achieved using scikit-learnrsquos library and API for
Dirichlet Process Gaussian Mixture Model (DPGMM) The DPGMM is the formal
name of the Dirichlet Process model used to cluster the data More specifically
scikit-learnrsquos implementation of the DPGMM uses the Stick Breaking method
one of several equally valid methods to assign songs to clusters [11] While the
mathematical details for this algorithm can be found at the following citation [12]
the most important aspects of the DPGMM are the arguments that the user can
specify and tune The first of these tunable parameters is the value α which is the
same parameter as the α discussed in the previous paragraph As seen in Figure 21
on the right side properly tuning α is key to obtaining meaningful clusters The
13
center image has α set to 001 which is too small and results in all of the data being
formed under one cluster On the other hand the bottom-right image has the same
data set and α set to 100 which does a better job of clustering On a related note
the figure also demonstrates the effectiveness of the DPGMM over the GMM On the
left side clearly the dataset contains 2 clusters but the GMM on the top-left image
assumes 5 clusters as a prior and consequentially clusters the data incorrectly while
the DPGMM manages to limit the data to 2 clusters
The second argument that the user inputs for the DPGMM is the data that
will be clustered The scikit-learn implementation takes the data in the format
of a nested list (N lists each of length m) where N is the number of data points
and m the number of features While the format of the data structure is relatively
straightforward choosing which numbers should be in the data was a challenge I
faced Selecting the relevant features of each song to be used in the algorithm will
be expounded upon in the next section ldquoFeature Selectionrdquo
The last argument that a user inputs for the scikit-learn DGPMM implementa-
tion is an argument indicating the upper bound for the number of clusters The
Dirichlet Process then determines the best number of clusters for the data between
1 and the upper bound Since the DPGMM is flexible enough to find the best value
I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to
modify the number of clusters formed
22 Feature Selection
One of the most difficult aspects of the Dirichlet method is choosing the features
to be used for clustering In other words when we organize the songs into clusters
14
Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α
we need to ensure that each cluster is distinct in a way that is statistically and
intuitively logical In the Million Song Dataset [9] each song is represented as a
JSON object containing several fields These fields are candidate features to be used
in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo
by Rick Astley and the corresponding features
artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID
for this artists is db9)
artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)
artist_mbtags_count shape = (4)
(raw tag count of the 4 tags this artist received on musicbrainzorg)
artist_name Rick Astley (artist name)
artist_playmeid 1338 (the ID of that artist on the service playmecom)
artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)
artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest
(number between 0 and 1))
artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest
(number between 0 and 1))
15
audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for
the analysis by The Echo Nest)
bars_confidence shape = (99) (confidence value (between 0 and 1) associated
with each bar by The Echo Nest)
bars_start shape = (99) (start time of each bar according to The Echo Nest this
song has 99 bars)
beats_confidence shape = (397))
confidence value (between 0 and 1) associated with each beat by The Echo Nest
beats_start shape = (397) (start time of each beat according to The Echo Nest
this song has 397 beats)
danceability 00 (danceability measure of this song according to The Echo Nest
(between 0 and 1 0 =gt not analyzed))
duration 21169587 (duration of the track in seconds)
end_of_fade_in 0139 (time of the end of the fade in at the beginning of the
song according to The Echo Nest)
energy 00 (energy measure (not in the signal processing sense) according to The
Echo Nest (between 0 and 1 0 = not analyzed))
key 1 (estimation of the key the song is in by The Echo Nest)
key_confidence 0324 (confidence of the key estimation)
loudness -775 (general loudness of the track)
mode 1 (estimation of the mode the song is in by The Echo Nest)
mode_confidence 0434 (confidence of the mode estimation)
release Big Tunes - Back 2 The 80s (album name from which the track was taken
some songs tracks can come from many albums we give only one)
release_7digitalid 786795 (the ID of the release (album) on the service 7digi-
talcom)
sections_confidence shape = (10) (confidence value (between 0 and 1) associated
16
with each section by The Echo Nest)
sections_start shape = (10) (start time of each section according to The Echo
Nest this song has 10 sections)
segments_confidence shape = (935) (confidence value (between 0 and 1) asso-
ciated with each segment by The Echo Nest)
segments_loudness_max shape = (935) (max loudness during each segment)
segments_loudness_max_time shape = (935) (time of the max loudness
during each segment)
segments_loudness_start shape = (935) (loudness at the beginning of each
segment)
segments_pitches shape = (935 12) (chroma features for each segment (normal-
ized so max is 1))
segments_start shape = (935) (start time of each segment ( musical event or
onset) according to The Echo Nest this song has 935 segments)
segments_timbre shape = (935 12) (MFCC-like features for each segment)
similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar
to Rick Astley according to The Echo Nest)
song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded
(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))
song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can
be associated with many tracks (with very slight audio differences))
start_of _fade _out 198536 (start time of the fade out in seconds at the end
of the song according to The Echo Nest)
tatums_confidence shape = (794) (confidence value (between 0 and 1) associated
with each tatum by The Echo Nest)
tatums_start shape = (794) (start time of each tatum according to The Echo
Nest this song has 794 tatums)
17
tempo 113359 (tempo in BPM according to The Echo Nest)
time_signature 4 (time signature of the song according to The Echo Nest ie
usual number of beats per bar)
time_signature_confidence 0634 (confidence of the time signature estimation)
title Never Gonna Give You Up (song title)
track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)
track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track
on which the analysis was done) year 1987 (year when this song was released
according to musicbrainzorg)
When choosing features my main goal was to use features that would most
likely yield meaningful results yet also be simple and make sense to the average
person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener
will have his or her opinions to what constitutes different types of music but some
common features most people tend to differentiate songs by are pitch rhythm and
the types of instruments used The following specific fields provided in each song
object fall under these three terms
Pitch
bull segments_pitches a matrix of values indicating the strength of each pitch (or
note) at each discernible time interval
Rhythm
bull beats_start a vector of values indicating the start time of each beat
bull time_signature the time signature of the song
bull tempo the speed of the song in Beats Per Minute (BPM)
Instruments
18
bull segments_timbre a matrix of values indicating the distribution of MFCC-like
features (different types of tones) for each segments
The segments_pitches feature is a clear candidate for a differentiating factor for
songs since it reveals patterns of notes that occur Additionally other research
papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a
procedure that allows all songs to be compared with the same metric Likewise
timbre is intuitively a reliable differentiating feature since it reveals the amount
that different tones or sounds that sound different despite having the same pitch
Therefore segments_timbre is another feature that is considered in each song
Finally we look at the candidate features for rhythm At first glance all of these
features appear to be useful as they indicate the rhythm of a song in one way or
another However none of these features are as useful as the pitch and timbre
features While tempo is one factor in differentiating genres of EDM and music in
general tempo alone is not a driving force of musical innovation Certain genres
of EDM like drum nrsquo bass and happycore stand out for having very fast tempos
but the tempo is supplemented with a sound unique to the genre Conceiving new
arrangements of pitches combining instruments in new ways and inventing new
types of sounds are novel but speeding up or slowing down existing sounds is not
Including tempo as a feature could actually add noise to the model since many genres
overlap in their tempos And finally tempo is measured indirectly when the pitch
and timbre features are normalized for each song everything is measured in units of
ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features
each second Time signature can be dismissed from the candidates features for the
same reason as tempo many genres contain the same time signature and including
it in the feature set would only add more noise beats_start looks like a more
promising feature since like segments_pitches and segments_timbre it consists of
a vector of values However difficulties arise when we begin to think how exactly
19
we can utilize this information Since each song varies in length we need a way to
compare songs of different durations on the same level One approach could be to
perform basic statistics on the distance between each beat for example calculating
the mean and standard deviation of this distance However the normalized pitch
and timbre information already capture this data Another possibility is detecting
certain patterns of beats which could differentiate the syncopated dubstep or glitch
music beats from the steady pulse of electro-house But once again every beat is
accompanied by a sound with a specific timbre and pitch so this feature would not
add any significantly new information
23 Collecting Data and Preprocessing Selected Fea-
tures
231 Collecting the Data
Upon deciding the features I wanted to use in my research I first needed to collect
all of the electronic songs in the Million Song Dataset The easiest reliable way to
achieve this was to iterate through each song in the database and save the information
for the songs where any of the artist genre tags in artist_mbtags matched with an
electronic music genre While this measure was not fully accurate because it looks at
the genre of the artist not the song specific genre information for each song was not
as easily accessible so this indicator was nearly as good a substitute To generate a
list of the genres that electronic songs would fall under I manually searched through
a subset of the MSD to find all genres that seemed to be releated to electronic music
In the case of genres that were sometimes but not always electronic in nature such
20
as disco or pop I erred on the side of caution and did not include them in the list
of electronic genres In these cases false positives such as primarily rock songs that
happen to have the disco label attached to the artist could inadvertantly be included
in the dataset The final list of genres is as follows
target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo
rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo
rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo
rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
rsquodance and electronicarsquorsquoelectronicrsquo]
232 Pitch Preprocessing
A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-
cally informed manner The study first takes the raw sound data and converts it into
a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest
amount Then it computes the most likely chord by comparing comparing the 4 most
common types of chords in popular music (major minor dominant 7 and minor 7) to
the observed chord The most common chords are represented as ldquotemplate chordsrdquo
and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For
example using the note C as the first index the C major chord is represented as
CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)
For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is
computed over every template chord
ρCTc =12sumi=1
(CT minus CTi)(ci minus c)σCTσc
21
where CT is the mean of the values in the template chord σCT is the standard
deviation of the values in the chord and the operations on c are analogous Note
that the summation is over each individual pitch in the 12 pitch classes The chord
template with the highest value of ρ is selected as the chord for the time frame
After this is performed for each time frame the values are smoothed and then the
change between adjacent chords is observed The reasoning behind this step is that
by measuring the relative distance between chords rather than the chords themselves
all songs can be compared in the same manner even though they may have different
key signatures Finally the study takes the types of chord changes and classifies
them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted
versions of the chord changes that make more sense to a human such as ldquochanges
involving dominant 7th chordsrdquo
In my preliminary implementation of this method on an electronic dance music
corpus I made a few modifications to Mauchrsquos study First I smoothed out time
frames before computing the most probable chords rather than smoothing the most
probable chords I did this to save time and to reduce volatility in the chord
measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference
which contains 935 time frames and lasts 212 seconds 5 time frames is slightly
under 1 second and for preliminary testing appeared to be a good interval for each
time block Second as mentioned in the literature section I did not abstract the
chord changes into H-topics This decision also stemmed from time constraints since
deriving semantic chord meaning from EDM songs would require careful research
into the types of harmonies and sounds common in that genre of music Below I
included a high-level visualization of the pitch metadata found in a sample song
ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change
vector that I could then feed into the Dirichlet Process algorithm
22
Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813
CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513
DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913
FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713
GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213
AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513
13 13 13 13
060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013
13 13 13 13 13 13 13 13 13 13 13 13
13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13
Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13
Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13
Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13
Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13
23
13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13
13
13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13
Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13
24
A final step I took to normalize the chord change data was to divide the numbers by
the length of the song so that each songrsquos number of chord changes was measured per
second
233 Timbre Preprocessing
For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre
uniformly across all songs [8] After collecting all song metadata I took a random
sample of 20 songs from each year starting at 1970 The reason I forced the sampling
to 20 randomly sampled songs from each year and did not take a random sample of
songs from all years at once was to prevent bias towards any type of sounds As seen
in figure 22 there are significantly more songs from 2000-2011 than before 2000 The
mean year is x = 2001052 the median year is 2003 and the standard deviation of the
years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include
a disproportionate amount of more recent songs In order to not miss out on sounds
that may be more prevalent in older songs I required a set number of songs from each
year Next from each randomly selected song I selected 20 random timbre frames
in order to prevent any biases in data collection within each song In total there
were 422020 = 16800 timbre frames collected Next I clustered the timbre frames
using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to
100 and selecting the number of clusters with the lowest Bayes Information Criterion
(BIC) a statistical measure commonly used to calculate the best fitting model The
BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters
and saved the mean values of each of the 12 timbre segments for each cluster formed
In the same way that every song had the same 192 cord changes whose frequencies
could be compared between songs each song now had the same 46 timbre clusters
but different frequencies in each song When reading in the metadata from each song
I calculated the most likely timbre cluster each timbre frame belonged to and kept
25
Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear
a frequency count of all of the possible timbre clusters observed in a song Finally
as with the pitch data I divided all observed counts by the duration of the song in
order to normalize each songrsquos timbre counts
26
Chapter 3
Results
31 Methodology
After the pitch and timbre data was processed I ran the Dirichlet Process on the
data For each song I concatenated the 192-element chord change frequency list
and the 46-element timbre category frequency list giving each song a total of 238
features However there is a problem with this setup The pitch data will inherently
dominate the clustering process since it contains almost 3 times as many features
as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to
give different weights to each feature I considered another possibility to remedy
this discrepancy duplicating the timbre vector a certain number of times and
concatenating that to the feature set of each song While this strategy runs the risk
of corrupting the feature set and turning it into something that does not accurately
represent each song it is important to keep in mind that even without duplicating
the timbre vector the feature set consists of two separate feature sets concatenated
to each other Therefore timbre duplication appears to be a reasonable strategy to
weigh pitch and timbre more evenly
27
After this modification I tweaked a few more parameters before obtaining my
final results Dividing the pitch and timbre frequencies by the duration of the song
normalized every song to frequency per second but it also had the undesired effect
of making the data too small Timbre and pitch frequencies per second were almost
always less than 10 and many times hovered as low as 0002 for nonzero values
Because all of the values were very close to each other using common values of α
in the range of 01 to 1000-2000 was insufficient to push the songs into different
clusters As a result every song fell into the same cluster Increasing the value
of α by several orders of magnitude to well over 10 million fixed the problem but
this solution presented two problems First tuning α to experiment with different
ways to cluster the music would be problematic since I would have to work with
an enormous range of possible values for α Second pushing α to such high values
is not appropriate for the Dirichlet Process Extremely high values of α indicate a
Dirichlet Process that will try to disperse the data into different clusters but a value
of α that high is in principle always assigning each new song to a new cluster On
the other hand varying α between 01 and 1000 for example presents a much wider
range of flexibility when assigning clusters While this may be possible by varying
the values of α an extreme amount with the data as it currently is we are using
the Dirichlet Process in a way it should mathematically not be used Therefore
multiplying all of the data by a constant value so that we can work in the appropriate
range of α is the ideal approach After some experimentation I found that k=10 was
an appropriate scaling factor After initial runs of the Dirichlet Process I found out
that there was a slight issue with some of the earlier songs Since I had only artist
genre tags not specific song tags for each song I chose songs based on whether any
of the tags associated with the artist fell under any electronic music genre including
the generic term rsquoelectronicrsquo There were some bands mostly older ones from the
1960s and 1970s like Electric Light Orchestra which had some electronic music but
28
mostly featured rock funk disco or another genre Given that these artists featured
mostly non-electronic songs I decided to exclude them from my study and generate
a blacklist indicating these music artists While it was infeasible to look through
every single song and determine whether it was electronic or not I was able to look
over the earliest songs in each cluster These songs were the most important to verify
as electronic because early non-electronic songs could end up forming new clusters
and inadvertently create clusters with non-electronic sounds that I was not looking for
The goal of this thesis is to identify different groups in which EM songs are
clustered and identify the most unique artists and genres While the second task is
very simple because it requires looking at the earliest songs in each cluster the first
is difficult to gauge the effectiveness of While I can look at the average chord change
and timbre category frequencies in each category as well as other metadata putting
more semantic interpretations to what the music actually sounds like and determining
whether the music is clustered properly is a very subjective process For this reason
I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and
compared the clustering in each category examining similarities and differences in
the clusters formed in each scenario in the Discussion section For each value α I
set the upper limit of components or clusters allowed to 50 The ranges of α I used
resulted in 9 14 and 19 clusters formed
32 Findings
321 α=005
When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are
the distribution of years of the songs in each cluster (note that the Dirichlet Process
29
does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are
skipped)
30
Figure 31 Song year distributions for α = 005
For each value of α I also calculated the average frequency of each chord change
category and timbre category for each cluster and plotted the results The green
lines correspond to timbre and the blue lines to pitch
31
Figure 32 Timbre and pitch distributions for α = 005
A table of each cluster formed the number of songs in that cluster and descriptions
of pitch timbre and rhythmic qualities characteristic of songs in that cluster are
shown below
32
Cluster Song Count Characteristic Sounds
0 6481 Minimalist industrial space sounds dissonant chords
1 5482 Soft New Age ethereal
2 2405 Defined sounds electronic and non-electronic instru-
ments played in standard rock rhythms
3 360 Very dense and complex synths slightly darker tone
4 4550 Heavily distorted rock and synthesizer
6 2854 Faster paced 80s synth rock acid house
8 798 Aggressive beats dense house music
9 1464 Ambient house trancelike strong beats mysterious
tone
11 1597 Melancholy tones New wave rock in 80s then starting
in 90s downtempo trip-hop nu-metal
Table 31 Song cluster descriptions for α = 005
322 α=01
A total of 14 clusters were formed (16 were formed but 2 clusters contained only
one song each I listened to both of these songs and they did not sound unique so
I discarded them from the clusters) Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
33
34
Figure 33 Song year distributions for α = 01
35
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
the sound waves Timbre is formally defined as the Mel frequency cepstral coefficients
(MFCC) of a transformed sound signal More informally it refers to the sound color
texture or tone quality and is associated with instrument types recording resources
and production techniques In other words two sounds that have the same pitch
but different tones (for example a bell and voice) are differentiated by their timbres
There are 12 MFCCs that define the timbre of a given sound Finally loudness
refers to intrinsically how loud the music sounds not loudness that a listener can
manipulate while listening to the music Loudness is the first MFCC of the timbre
of a sound [7] The study concluded that over time music has been becoming louder
and less diverse
The restriction of pitch sequences (with metrics showing less variety inpitch progressions) the homogenization of the timbral palette (with fre-quent timbres becoming more frequent) and growing average loudnesslevels (threatening a dynamic richness that has been conserved until to-day) This suggests that our perception of the new would be essentiallyrooted on identifying simpler pitch sequences fashionable timbral mix-tures and louder volumes Hence an old tune with slightly simpler chordprogressions new instrument sonorities that were in agreement with cur-rent tendencies and recorded with modern techniques that allowed forincreased loudness levels could be easily perceived as novel fashionableand groundbreaking
This study serves as a good starting point for mathematically analyzing music in
a few ways First it utilizes the Million Song Dataset which addresses the issue
of legally obtaining music metadata As mentioned in section 13 the only legal
way to obtain playable music for this study would have been to purchase all songs I
would include which is infeasible While the Million Song Dataset does not contain
the audio files in playable format it does contain audio features and metadata that
allow for in-depth analysis In addition working with the dataset takes out the
work of extracting features from raw audio files saving an extensive amount of time
and energy Second the study establishes specifics for what constitutes a trend
in music Pitch timbre and loudness are core features of music and examining the6
distributions of each among songs over time reveals a lot of information about how
the music industry and consumersrsquo tastes have evolved While these are not all of the
features contained in a song they serve as a good starting point Third the study
defines mathematical ways to capture music attributes and measure their change
over time For example pitches are transposed into the same tonal context with
binary discretized pitch descriptions based on a threshold so that each song can be
represented with vectors of pitches that are normalized and compared to other songs
While this study lays some solid groundwork for capturing and analyzing nu-
meric qualities of music it falls short of addressing my goals in a couple of ways
First it does not perform any analysis with respect to music genre While the
analysis performed in this paper could easily be applied to a list of songs in a specific
genre certain genres might have unique sounds and rhythms relative to other genres
that would be worth studying in greater detail Second the study only measures
general trends in music over time The models used to describe changes are simple
regressions that donrsquot look at more nuanced changes For example what styles of
music developed over certain periods of time How rapid were those changes Which
styles of music developed from which other styles
A more promising study led by music researcher Matthias Mauch [8] analyzes
contemporary popular Western Music from the 1960s to 2010s by comparing numer-
ical data on the pitches and timbre of a corpus of 17000 songs that appeared on the
Billboard Hot 100 Like the previously mentioned paper Measuring the Evolution
of Contemporary Western Popular Music Mauchrsquos study also creates abstractions
of pitch and timbre in order to provide a consistent and meaningful semantic inter-
pretation of musical data (see figure 12) However Mauchrsquos study takes this idea a
step further by using genre tags from Lastfm a music website and constructing a
7
hierarchy of music genres using hierarchical clustering Additionally the study takes
a crack at determining whether a particular band the Beatles was musically ground-
breaking for its time or merely playing off sounds that other bands had already used
Figure 12 Data processing pipeline for Mauchrsquos study illustrated with a segment ofQueenrsquos Bohemian Rhapsody 1975
While both Measuring the Evolution of Contemporary Western Popular Music
and Mauchrsquos study created abstractions of pitch and timbre Mauchrsquos study is more
appealing with respect to my goal because its end results align more closely with
mine Additionally the data processing pipeline offers several layers of abstraction
8
and depending on my progress I would be able to achieve at least one of the levels of
abstraction As shown in figure 12 each segment of a raw audio file is first broken
down into its 12 timbre MFCCs and pitch components Next the study constructs
ldquolexiconsrdquo or a dictionary of pitch and timbre terms that all songs can be compared
to For pitch the original data is in a N-by-12 matrix where N is the number of time
segments in the song and 12 the number of each of the notes found in an octave of
pitches Each time segment contains the relative strengths of each of the 12 pitches
However music sounds are not merely a collection of pitches but more precisely
chords Furthermore the similarity of two songs is not determined by the absolute
pitches of their chords but rather the progression of chords in the song all relative to
each other For example if all the notes in a song are transposed by one step the song
will sound different in terms of absolute pitch but the song will still be recognized
as the original because all of the relative movements from each chord to the next
are the same This phenomenon is captured in the pitch data by finding the most
likely chord played at each time segment then counting the change to the next chord
at each time step and generating a table of chord change frequencies for each song
Constructing the timbre lexcion is more complicated since there is no easy analogue
like chords for pitches to compare songs Mauchrsquos study utilizes a Gaussian Mixture
Model (GMM) by iterating over k=1 to k=N clusters where N is a large number
running the GMM on each prior assumption of k clusters and computing the Bayes
Information Criterion (BIC) for each model The lowest of the N BIC values is found
and that value of k is selected That model contains k different timbre clusters
and each cluster contains the mean timbre value for each of the 12 timbre components
For my research I decided that the pitch and timbre lexicons would be the most
realistic level of abstraction I could obtain Mauchrsquos study adds an addtional layer
to pitch and timbre by identifying the most common patterns of chord changes and
9
most common timbre rhythms and creating more general tags from these combined
terms such as ldquo stepwise changes indicating modal harmonyrdquo for a pitch topic and
ldquooh rounded mellowrdquo for a timbral topic There were two problems with using this
final layer of abstraction for my study First attaching semantic interpretations to
the pitch and timbral lexicons is a difficult task For timbre I would need to listen
to sound samples containing all of the different timbral categories I identified and
attaching user interpretations to them For the chords not only would I have to
perform the same analysis as on timbre but take careful attention to identify which
chords correspond to common sound progressions in popular music a task that I am
not qualified for an did not have the resources for this thesis to seek out Second
this final layer of abstraction was not necessary for the end goal of my paper In
fact consolidating my pitch and timbre lexicons into simpler phrases would run the
risk of pigeonholing my analysis and preventing me from discovering more nuanced
patterns in my final results Therefore I decided to focus on pitch and timbral
lexicon construction as the furthest levels of abstraction when processing songs for
my thesis Mathematical details on how I constructed the lexical and timbral lexicons
can be found in the Mathematical Modeling section of this paper
13 The Dataset
In order to successfully execute my thesis I need access to an extensive database of
music Until recently acquiring a substantial corpus of music data was a difficult and
costly task It is illegal to download music audio files from video and music-sharing
sites such as YouTube Spotify and Pandora Some platforms such as iTunes offer
90-second previews of songs but using only segments of songs and usually segments
that showcase the chorus of the song are not reliable measures to capture the entire
essence of a song Even if I were to legally download entire audio files for free I would
10
run into additional issues Obtaining a high-quality corpora of song data would be
challenging writing scripts that crawl music sharing platforms may not capture all of
the music I am looking for And once I have the audio files I would have to perform
audio processing techniques to extract the relevant information from the songs
Fortunately there is an easy solution to the music data acquisition problem
The Million Song Dataset (MSD) is a collection of metadata for one million music
tracks dating up to 2011 Various organizations such as The Echo Nest Musicbrainz
7digital and Lastfm have contributed different pieces of metadata Each song is
represented as a Hierarchical Data Format file (HDF5) which can be loaded as a
JSON object The fields encompass topical features such as the song title artist
and release date as well as lower-level features such as the loudness starting beat
time pitches and timbre of several segments of the song [9] While the MSD is
the largest free and open source music metadata dataset I could find there is no
guarantee that it adequately covers the entire spectrum of EM artists and songs
This quality limitation is important to consider throughout the study A quick look
through the songs including the subset of data I worked with for this report showed
that there were several well-known artists and songs in the EM scene Therefore
while the MSD may not contain all desired songs for this project it contains an
adequate number of relevant songs to produce some meaningful results Additionally
laying the groundwork for modeling the similarities between songs and identifying
groundbreaking ones is the same regardless of the songs included and the following
methodologies can be implemented on any similarly-formatted dataset including one
with songs that might currently be missing in the MSD
11
Chapter 2
Mathematical Modeling
21 Determining Novelty of Songs
Finding an logical and implementable mathematical model was and continues to be
an important aspect of my research My problem how to mathematically determine
which songs were unique for their time requires an algorithm in which each song is
introduced in chronological order either joining an existing category or starting a
new category based on its musical similarity to songs already introduced Clustering
algorithms like k-means or Gaussian Mixture Models (GMM) which have a prede-
termined number of clusters and optimize the partitioning of a dataset into those
cluster assume a fixed number of clusters While this process would work if we knew
exactly how many genres of EM existed if we guess wrong our end results may end
up with clusters that are wrongly grouped together or separated It is much better to
apply a clustering algorithm that does not make any assumptions about this number
One particularly promising process that addresses the issue of tthe number of
clusters is a family of algorithms known as Dirichlet Proccesses (DPs) DPs are
useful for this particular application because (1) they assign clusters to a dataset
12
with only an upper bound on the number of clusters and (2) by sorting the songs
in chronological order before running the algorithm and keeping track of which
songs are categorized under each cluster we can observe the earliest songs in each
cluster and consequentially infer which songs were responsible for creating new
clusters The arguments for the DP The DP is controlled by a parameter α which
is the concentration parameter The expected number of clusters formed is directly
proportional to the value of α so the higher the value of α the more likely new
clusters will be formed [10] Regardless of the value of α as the number of data
points introduced increases the probability of a new group being formed decreases
That is a ldquorich get richerrdquo policy is in place and existing clusters tend to grow in
size Tweaking the value of the tunable parameter α is an important part of the
study since it determines the flexibility given to forming a new cluster If the value
of α is too small then the criteria for forming clusters will be too strict and data
that should be in different clusters will be assigned to the same cluster On the other
hand if α is too large the algorithm will be too sensitive and assign similar songs to
different clusters
The implementation of the DP was achieved using scikit-learnrsquos library and API for
Dirichlet Process Gaussian Mixture Model (DPGMM) The DPGMM is the formal
name of the Dirichlet Process model used to cluster the data More specifically
scikit-learnrsquos implementation of the DPGMM uses the Stick Breaking method
one of several equally valid methods to assign songs to clusters [11] While the
mathematical details for this algorithm can be found at the following citation [12]
the most important aspects of the DPGMM are the arguments that the user can
specify and tune The first of these tunable parameters is the value α which is the
same parameter as the α discussed in the previous paragraph As seen in Figure 21
on the right side properly tuning α is key to obtaining meaningful clusters The
13
center image has α set to 001 which is too small and results in all of the data being
formed under one cluster On the other hand the bottom-right image has the same
data set and α set to 100 which does a better job of clustering On a related note
the figure also demonstrates the effectiveness of the DPGMM over the GMM On the
left side clearly the dataset contains 2 clusters but the GMM on the top-left image
assumes 5 clusters as a prior and consequentially clusters the data incorrectly while
the DPGMM manages to limit the data to 2 clusters
The second argument that the user inputs for the DPGMM is the data that
will be clustered The scikit-learn implementation takes the data in the format
of a nested list (N lists each of length m) where N is the number of data points
and m the number of features While the format of the data structure is relatively
straightforward choosing which numbers should be in the data was a challenge I
faced Selecting the relevant features of each song to be used in the algorithm will
be expounded upon in the next section ldquoFeature Selectionrdquo
The last argument that a user inputs for the scikit-learn DGPMM implementa-
tion is an argument indicating the upper bound for the number of clusters The
Dirichlet Process then determines the best number of clusters for the data between
1 and the upper bound Since the DPGMM is flexible enough to find the best value
I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to
modify the number of clusters formed
22 Feature Selection
One of the most difficult aspects of the Dirichlet method is choosing the features
to be used for clustering In other words when we organize the songs into clusters
14
Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α
we need to ensure that each cluster is distinct in a way that is statistically and
intuitively logical In the Million Song Dataset [9] each song is represented as a
JSON object containing several fields These fields are candidate features to be used
in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo
by Rick Astley and the corresponding features
artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID
for this artists is db9)
artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)
artist_mbtags_count shape = (4)
(raw tag count of the 4 tags this artist received on musicbrainzorg)
artist_name Rick Astley (artist name)
artist_playmeid 1338 (the ID of that artist on the service playmecom)
artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)
artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest
(number between 0 and 1))
artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest
(number between 0 and 1))
15
audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for
the analysis by The Echo Nest)
bars_confidence shape = (99) (confidence value (between 0 and 1) associated
with each bar by The Echo Nest)
bars_start shape = (99) (start time of each bar according to The Echo Nest this
song has 99 bars)
beats_confidence shape = (397))
confidence value (between 0 and 1) associated with each beat by The Echo Nest
beats_start shape = (397) (start time of each beat according to The Echo Nest
this song has 397 beats)
danceability 00 (danceability measure of this song according to The Echo Nest
(between 0 and 1 0 =gt not analyzed))
duration 21169587 (duration of the track in seconds)
end_of_fade_in 0139 (time of the end of the fade in at the beginning of the
song according to The Echo Nest)
energy 00 (energy measure (not in the signal processing sense) according to The
Echo Nest (between 0 and 1 0 = not analyzed))
key 1 (estimation of the key the song is in by The Echo Nest)
key_confidence 0324 (confidence of the key estimation)
loudness -775 (general loudness of the track)
mode 1 (estimation of the mode the song is in by The Echo Nest)
mode_confidence 0434 (confidence of the mode estimation)
release Big Tunes - Back 2 The 80s (album name from which the track was taken
some songs tracks can come from many albums we give only one)
release_7digitalid 786795 (the ID of the release (album) on the service 7digi-
talcom)
sections_confidence shape = (10) (confidence value (between 0 and 1) associated
16
with each section by The Echo Nest)
sections_start shape = (10) (start time of each section according to The Echo
Nest this song has 10 sections)
segments_confidence shape = (935) (confidence value (between 0 and 1) asso-
ciated with each segment by The Echo Nest)
segments_loudness_max shape = (935) (max loudness during each segment)
segments_loudness_max_time shape = (935) (time of the max loudness
during each segment)
segments_loudness_start shape = (935) (loudness at the beginning of each
segment)
segments_pitches shape = (935 12) (chroma features for each segment (normal-
ized so max is 1))
segments_start shape = (935) (start time of each segment ( musical event or
onset) according to The Echo Nest this song has 935 segments)
segments_timbre shape = (935 12) (MFCC-like features for each segment)
similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar
to Rick Astley according to The Echo Nest)
song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded
(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))
song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can
be associated with many tracks (with very slight audio differences))
start_of _fade _out 198536 (start time of the fade out in seconds at the end
of the song according to The Echo Nest)
tatums_confidence shape = (794) (confidence value (between 0 and 1) associated
with each tatum by The Echo Nest)
tatums_start shape = (794) (start time of each tatum according to The Echo
Nest this song has 794 tatums)
17
tempo 113359 (tempo in BPM according to The Echo Nest)
time_signature 4 (time signature of the song according to The Echo Nest ie
usual number of beats per bar)
time_signature_confidence 0634 (confidence of the time signature estimation)
title Never Gonna Give You Up (song title)
track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)
track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track
on which the analysis was done) year 1987 (year when this song was released
according to musicbrainzorg)
When choosing features my main goal was to use features that would most
likely yield meaningful results yet also be simple and make sense to the average
person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener
will have his or her opinions to what constitutes different types of music but some
common features most people tend to differentiate songs by are pitch rhythm and
the types of instruments used The following specific fields provided in each song
object fall under these three terms
Pitch
bull segments_pitches a matrix of values indicating the strength of each pitch (or
note) at each discernible time interval
Rhythm
bull beats_start a vector of values indicating the start time of each beat
bull time_signature the time signature of the song
bull tempo the speed of the song in Beats Per Minute (BPM)
Instruments
18
bull segments_timbre a matrix of values indicating the distribution of MFCC-like
features (different types of tones) for each segments
The segments_pitches feature is a clear candidate for a differentiating factor for
songs since it reveals patterns of notes that occur Additionally other research
papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a
procedure that allows all songs to be compared with the same metric Likewise
timbre is intuitively a reliable differentiating feature since it reveals the amount
that different tones or sounds that sound different despite having the same pitch
Therefore segments_timbre is another feature that is considered in each song
Finally we look at the candidate features for rhythm At first glance all of these
features appear to be useful as they indicate the rhythm of a song in one way or
another However none of these features are as useful as the pitch and timbre
features While tempo is one factor in differentiating genres of EDM and music in
general tempo alone is not a driving force of musical innovation Certain genres
of EDM like drum nrsquo bass and happycore stand out for having very fast tempos
but the tempo is supplemented with a sound unique to the genre Conceiving new
arrangements of pitches combining instruments in new ways and inventing new
types of sounds are novel but speeding up or slowing down existing sounds is not
Including tempo as a feature could actually add noise to the model since many genres
overlap in their tempos And finally tempo is measured indirectly when the pitch
and timbre features are normalized for each song everything is measured in units of
ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features
each second Time signature can be dismissed from the candidates features for the
same reason as tempo many genres contain the same time signature and including
it in the feature set would only add more noise beats_start looks like a more
promising feature since like segments_pitches and segments_timbre it consists of
a vector of values However difficulties arise when we begin to think how exactly
19
we can utilize this information Since each song varies in length we need a way to
compare songs of different durations on the same level One approach could be to
perform basic statistics on the distance between each beat for example calculating
the mean and standard deviation of this distance However the normalized pitch
and timbre information already capture this data Another possibility is detecting
certain patterns of beats which could differentiate the syncopated dubstep or glitch
music beats from the steady pulse of electro-house But once again every beat is
accompanied by a sound with a specific timbre and pitch so this feature would not
add any significantly new information
23 Collecting Data and Preprocessing Selected Fea-
tures
231 Collecting the Data
Upon deciding the features I wanted to use in my research I first needed to collect
all of the electronic songs in the Million Song Dataset The easiest reliable way to
achieve this was to iterate through each song in the database and save the information
for the songs where any of the artist genre tags in artist_mbtags matched with an
electronic music genre While this measure was not fully accurate because it looks at
the genre of the artist not the song specific genre information for each song was not
as easily accessible so this indicator was nearly as good a substitute To generate a
list of the genres that electronic songs would fall under I manually searched through
a subset of the MSD to find all genres that seemed to be releated to electronic music
In the case of genres that were sometimes but not always electronic in nature such
20
as disco or pop I erred on the side of caution and did not include them in the list
of electronic genres In these cases false positives such as primarily rock songs that
happen to have the disco label attached to the artist could inadvertantly be included
in the dataset The final list of genres is as follows
target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo
rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo
rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo
rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
rsquodance and electronicarsquorsquoelectronicrsquo]
232 Pitch Preprocessing
A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-
cally informed manner The study first takes the raw sound data and converts it into
a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest
amount Then it computes the most likely chord by comparing comparing the 4 most
common types of chords in popular music (major minor dominant 7 and minor 7) to
the observed chord The most common chords are represented as ldquotemplate chordsrdquo
and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For
example using the note C as the first index the C major chord is represented as
CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)
For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is
computed over every template chord
ρCTc =12sumi=1
(CT minus CTi)(ci minus c)σCTσc
21
where CT is the mean of the values in the template chord σCT is the standard
deviation of the values in the chord and the operations on c are analogous Note
that the summation is over each individual pitch in the 12 pitch classes The chord
template with the highest value of ρ is selected as the chord for the time frame
After this is performed for each time frame the values are smoothed and then the
change between adjacent chords is observed The reasoning behind this step is that
by measuring the relative distance between chords rather than the chords themselves
all songs can be compared in the same manner even though they may have different
key signatures Finally the study takes the types of chord changes and classifies
them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted
versions of the chord changes that make more sense to a human such as ldquochanges
involving dominant 7th chordsrdquo
In my preliminary implementation of this method on an electronic dance music
corpus I made a few modifications to Mauchrsquos study First I smoothed out time
frames before computing the most probable chords rather than smoothing the most
probable chords I did this to save time and to reduce volatility in the chord
measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference
which contains 935 time frames and lasts 212 seconds 5 time frames is slightly
under 1 second and for preliminary testing appeared to be a good interval for each
time block Second as mentioned in the literature section I did not abstract the
chord changes into H-topics This decision also stemmed from time constraints since
deriving semantic chord meaning from EDM songs would require careful research
into the types of harmonies and sounds common in that genre of music Below I
included a high-level visualization of the pitch metadata found in a sample song
ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change
vector that I could then feed into the Dirichlet Process algorithm
22
Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813
CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513
DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913
FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713
GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213
AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513
13 13 13 13
060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013
13 13 13 13 13 13 13 13 13 13 13 13
13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13
Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13
Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13
Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13
Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13
23
13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13
13
13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13
Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13
24
A final step I took to normalize the chord change data was to divide the numbers by
the length of the song so that each songrsquos number of chord changes was measured per
second
233 Timbre Preprocessing
For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre
uniformly across all songs [8] After collecting all song metadata I took a random
sample of 20 songs from each year starting at 1970 The reason I forced the sampling
to 20 randomly sampled songs from each year and did not take a random sample of
songs from all years at once was to prevent bias towards any type of sounds As seen
in figure 22 there are significantly more songs from 2000-2011 than before 2000 The
mean year is x = 2001052 the median year is 2003 and the standard deviation of the
years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include
a disproportionate amount of more recent songs In order to not miss out on sounds
that may be more prevalent in older songs I required a set number of songs from each
year Next from each randomly selected song I selected 20 random timbre frames
in order to prevent any biases in data collection within each song In total there
were 422020 = 16800 timbre frames collected Next I clustered the timbre frames
using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to
100 and selecting the number of clusters with the lowest Bayes Information Criterion
(BIC) a statistical measure commonly used to calculate the best fitting model The
BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters
and saved the mean values of each of the 12 timbre segments for each cluster formed
In the same way that every song had the same 192 cord changes whose frequencies
could be compared between songs each song now had the same 46 timbre clusters
but different frequencies in each song When reading in the metadata from each song
I calculated the most likely timbre cluster each timbre frame belonged to and kept
25
Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear
a frequency count of all of the possible timbre clusters observed in a song Finally
as with the pitch data I divided all observed counts by the duration of the song in
order to normalize each songrsquos timbre counts
26
Chapter 3
Results
31 Methodology
After the pitch and timbre data was processed I ran the Dirichlet Process on the
data For each song I concatenated the 192-element chord change frequency list
and the 46-element timbre category frequency list giving each song a total of 238
features However there is a problem with this setup The pitch data will inherently
dominate the clustering process since it contains almost 3 times as many features
as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to
give different weights to each feature I considered another possibility to remedy
this discrepancy duplicating the timbre vector a certain number of times and
concatenating that to the feature set of each song While this strategy runs the risk
of corrupting the feature set and turning it into something that does not accurately
represent each song it is important to keep in mind that even without duplicating
the timbre vector the feature set consists of two separate feature sets concatenated
to each other Therefore timbre duplication appears to be a reasonable strategy to
weigh pitch and timbre more evenly
27
After this modification I tweaked a few more parameters before obtaining my
final results Dividing the pitch and timbre frequencies by the duration of the song
normalized every song to frequency per second but it also had the undesired effect
of making the data too small Timbre and pitch frequencies per second were almost
always less than 10 and many times hovered as low as 0002 for nonzero values
Because all of the values were very close to each other using common values of α
in the range of 01 to 1000-2000 was insufficient to push the songs into different
clusters As a result every song fell into the same cluster Increasing the value
of α by several orders of magnitude to well over 10 million fixed the problem but
this solution presented two problems First tuning α to experiment with different
ways to cluster the music would be problematic since I would have to work with
an enormous range of possible values for α Second pushing α to such high values
is not appropriate for the Dirichlet Process Extremely high values of α indicate a
Dirichlet Process that will try to disperse the data into different clusters but a value
of α that high is in principle always assigning each new song to a new cluster On
the other hand varying α between 01 and 1000 for example presents a much wider
range of flexibility when assigning clusters While this may be possible by varying
the values of α an extreme amount with the data as it currently is we are using
the Dirichlet Process in a way it should mathematically not be used Therefore
multiplying all of the data by a constant value so that we can work in the appropriate
range of α is the ideal approach After some experimentation I found that k=10 was
an appropriate scaling factor After initial runs of the Dirichlet Process I found out
that there was a slight issue with some of the earlier songs Since I had only artist
genre tags not specific song tags for each song I chose songs based on whether any
of the tags associated with the artist fell under any electronic music genre including
the generic term rsquoelectronicrsquo There were some bands mostly older ones from the
1960s and 1970s like Electric Light Orchestra which had some electronic music but
28
mostly featured rock funk disco or another genre Given that these artists featured
mostly non-electronic songs I decided to exclude them from my study and generate
a blacklist indicating these music artists While it was infeasible to look through
every single song and determine whether it was electronic or not I was able to look
over the earliest songs in each cluster These songs were the most important to verify
as electronic because early non-electronic songs could end up forming new clusters
and inadvertently create clusters with non-electronic sounds that I was not looking for
The goal of this thesis is to identify different groups in which EM songs are
clustered and identify the most unique artists and genres While the second task is
very simple because it requires looking at the earliest songs in each cluster the first
is difficult to gauge the effectiveness of While I can look at the average chord change
and timbre category frequencies in each category as well as other metadata putting
more semantic interpretations to what the music actually sounds like and determining
whether the music is clustered properly is a very subjective process For this reason
I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and
compared the clustering in each category examining similarities and differences in
the clusters formed in each scenario in the Discussion section For each value α I
set the upper limit of components or clusters allowed to 50 The ranges of α I used
resulted in 9 14 and 19 clusters formed
32 Findings
321 α=005
When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are
the distribution of years of the songs in each cluster (note that the Dirichlet Process
29
does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are
skipped)
30
Figure 31 Song year distributions for α = 005
For each value of α I also calculated the average frequency of each chord change
category and timbre category for each cluster and plotted the results The green
lines correspond to timbre and the blue lines to pitch
31
Figure 32 Timbre and pitch distributions for α = 005
A table of each cluster formed the number of songs in that cluster and descriptions
of pitch timbre and rhythmic qualities characteristic of songs in that cluster are
shown below
32
Cluster Song Count Characteristic Sounds
0 6481 Minimalist industrial space sounds dissonant chords
1 5482 Soft New Age ethereal
2 2405 Defined sounds electronic and non-electronic instru-
ments played in standard rock rhythms
3 360 Very dense and complex synths slightly darker tone
4 4550 Heavily distorted rock and synthesizer
6 2854 Faster paced 80s synth rock acid house
8 798 Aggressive beats dense house music
9 1464 Ambient house trancelike strong beats mysterious
tone
11 1597 Melancholy tones New wave rock in 80s then starting
in 90s downtempo trip-hop nu-metal
Table 31 Song cluster descriptions for α = 005
322 α=01
A total of 14 clusters were formed (16 were formed but 2 clusters contained only
one song each I listened to both of these songs and they did not sound unique so
I discarded them from the clusters) Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
33
34
Figure 33 Song year distributions for α = 01
35
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
distributions of each among songs over time reveals a lot of information about how
the music industry and consumersrsquo tastes have evolved While these are not all of the
features contained in a song they serve as a good starting point Third the study
defines mathematical ways to capture music attributes and measure their change
over time For example pitches are transposed into the same tonal context with
binary discretized pitch descriptions based on a threshold so that each song can be
represented with vectors of pitches that are normalized and compared to other songs
While this study lays some solid groundwork for capturing and analyzing nu-
meric qualities of music it falls short of addressing my goals in a couple of ways
First it does not perform any analysis with respect to music genre While the
analysis performed in this paper could easily be applied to a list of songs in a specific
genre certain genres might have unique sounds and rhythms relative to other genres
that would be worth studying in greater detail Second the study only measures
general trends in music over time The models used to describe changes are simple
regressions that donrsquot look at more nuanced changes For example what styles of
music developed over certain periods of time How rapid were those changes Which
styles of music developed from which other styles
A more promising study led by music researcher Matthias Mauch [8] analyzes
contemporary popular Western Music from the 1960s to 2010s by comparing numer-
ical data on the pitches and timbre of a corpus of 17000 songs that appeared on the
Billboard Hot 100 Like the previously mentioned paper Measuring the Evolution
of Contemporary Western Popular Music Mauchrsquos study also creates abstractions
of pitch and timbre in order to provide a consistent and meaningful semantic inter-
pretation of musical data (see figure 12) However Mauchrsquos study takes this idea a
step further by using genre tags from Lastfm a music website and constructing a
7
hierarchy of music genres using hierarchical clustering Additionally the study takes
a crack at determining whether a particular band the Beatles was musically ground-
breaking for its time or merely playing off sounds that other bands had already used
Figure 12 Data processing pipeline for Mauchrsquos study illustrated with a segment ofQueenrsquos Bohemian Rhapsody 1975
While both Measuring the Evolution of Contemporary Western Popular Music
and Mauchrsquos study created abstractions of pitch and timbre Mauchrsquos study is more
appealing with respect to my goal because its end results align more closely with
mine Additionally the data processing pipeline offers several layers of abstraction
8
and depending on my progress I would be able to achieve at least one of the levels of
abstraction As shown in figure 12 each segment of a raw audio file is first broken
down into its 12 timbre MFCCs and pitch components Next the study constructs
ldquolexiconsrdquo or a dictionary of pitch and timbre terms that all songs can be compared
to For pitch the original data is in a N-by-12 matrix where N is the number of time
segments in the song and 12 the number of each of the notes found in an octave of
pitches Each time segment contains the relative strengths of each of the 12 pitches
However music sounds are not merely a collection of pitches but more precisely
chords Furthermore the similarity of two songs is not determined by the absolute
pitches of their chords but rather the progression of chords in the song all relative to
each other For example if all the notes in a song are transposed by one step the song
will sound different in terms of absolute pitch but the song will still be recognized
as the original because all of the relative movements from each chord to the next
are the same This phenomenon is captured in the pitch data by finding the most
likely chord played at each time segment then counting the change to the next chord
at each time step and generating a table of chord change frequencies for each song
Constructing the timbre lexcion is more complicated since there is no easy analogue
like chords for pitches to compare songs Mauchrsquos study utilizes a Gaussian Mixture
Model (GMM) by iterating over k=1 to k=N clusters where N is a large number
running the GMM on each prior assumption of k clusters and computing the Bayes
Information Criterion (BIC) for each model The lowest of the N BIC values is found
and that value of k is selected That model contains k different timbre clusters
and each cluster contains the mean timbre value for each of the 12 timbre components
For my research I decided that the pitch and timbre lexicons would be the most
realistic level of abstraction I could obtain Mauchrsquos study adds an addtional layer
to pitch and timbre by identifying the most common patterns of chord changes and
9
most common timbre rhythms and creating more general tags from these combined
terms such as ldquo stepwise changes indicating modal harmonyrdquo for a pitch topic and
ldquooh rounded mellowrdquo for a timbral topic There were two problems with using this
final layer of abstraction for my study First attaching semantic interpretations to
the pitch and timbral lexicons is a difficult task For timbre I would need to listen
to sound samples containing all of the different timbral categories I identified and
attaching user interpretations to them For the chords not only would I have to
perform the same analysis as on timbre but take careful attention to identify which
chords correspond to common sound progressions in popular music a task that I am
not qualified for an did not have the resources for this thesis to seek out Second
this final layer of abstraction was not necessary for the end goal of my paper In
fact consolidating my pitch and timbre lexicons into simpler phrases would run the
risk of pigeonholing my analysis and preventing me from discovering more nuanced
patterns in my final results Therefore I decided to focus on pitch and timbral
lexicon construction as the furthest levels of abstraction when processing songs for
my thesis Mathematical details on how I constructed the lexical and timbral lexicons
can be found in the Mathematical Modeling section of this paper
13 The Dataset
In order to successfully execute my thesis I need access to an extensive database of
music Until recently acquiring a substantial corpus of music data was a difficult and
costly task It is illegal to download music audio files from video and music-sharing
sites such as YouTube Spotify and Pandora Some platforms such as iTunes offer
90-second previews of songs but using only segments of songs and usually segments
that showcase the chorus of the song are not reliable measures to capture the entire
essence of a song Even if I were to legally download entire audio files for free I would
10
run into additional issues Obtaining a high-quality corpora of song data would be
challenging writing scripts that crawl music sharing platforms may not capture all of
the music I am looking for And once I have the audio files I would have to perform
audio processing techniques to extract the relevant information from the songs
Fortunately there is an easy solution to the music data acquisition problem
The Million Song Dataset (MSD) is a collection of metadata for one million music
tracks dating up to 2011 Various organizations such as The Echo Nest Musicbrainz
7digital and Lastfm have contributed different pieces of metadata Each song is
represented as a Hierarchical Data Format file (HDF5) which can be loaded as a
JSON object The fields encompass topical features such as the song title artist
and release date as well as lower-level features such as the loudness starting beat
time pitches and timbre of several segments of the song [9] While the MSD is
the largest free and open source music metadata dataset I could find there is no
guarantee that it adequately covers the entire spectrum of EM artists and songs
This quality limitation is important to consider throughout the study A quick look
through the songs including the subset of data I worked with for this report showed
that there were several well-known artists and songs in the EM scene Therefore
while the MSD may not contain all desired songs for this project it contains an
adequate number of relevant songs to produce some meaningful results Additionally
laying the groundwork for modeling the similarities between songs and identifying
groundbreaking ones is the same regardless of the songs included and the following
methodologies can be implemented on any similarly-formatted dataset including one
with songs that might currently be missing in the MSD
11
Chapter 2
Mathematical Modeling
21 Determining Novelty of Songs
Finding an logical and implementable mathematical model was and continues to be
an important aspect of my research My problem how to mathematically determine
which songs were unique for their time requires an algorithm in which each song is
introduced in chronological order either joining an existing category or starting a
new category based on its musical similarity to songs already introduced Clustering
algorithms like k-means or Gaussian Mixture Models (GMM) which have a prede-
termined number of clusters and optimize the partitioning of a dataset into those
cluster assume a fixed number of clusters While this process would work if we knew
exactly how many genres of EM existed if we guess wrong our end results may end
up with clusters that are wrongly grouped together or separated It is much better to
apply a clustering algorithm that does not make any assumptions about this number
One particularly promising process that addresses the issue of tthe number of
clusters is a family of algorithms known as Dirichlet Proccesses (DPs) DPs are
useful for this particular application because (1) they assign clusters to a dataset
12
with only an upper bound on the number of clusters and (2) by sorting the songs
in chronological order before running the algorithm and keeping track of which
songs are categorized under each cluster we can observe the earliest songs in each
cluster and consequentially infer which songs were responsible for creating new
clusters The arguments for the DP The DP is controlled by a parameter α which
is the concentration parameter The expected number of clusters formed is directly
proportional to the value of α so the higher the value of α the more likely new
clusters will be formed [10] Regardless of the value of α as the number of data
points introduced increases the probability of a new group being formed decreases
That is a ldquorich get richerrdquo policy is in place and existing clusters tend to grow in
size Tweaking the value of the tunable parameter α is an important part of the
study since it determines the flexibility given to forming a new cluster If the value
of α is too small then the criteria for forming clusters will be too strict and data
that should be in different clusters will be assigned to the same cluster On the other
hand if α is too large the algorithm will be too sensitive and assign similar songs to
different clusters
The implementation of the DP was achieved using scikit-learnrsquos library and API for
Dirichlet Process Gaussian Mixture Model (DPGMM) The DPGMM is the formal
name of the Dirichlet Process model used to cluster the data More specifically
scikit-learnrsquos implementation of the DPGMM uses the Stick Breaking method
one of several equally valid methods to assign songs to clusters [11] While the
mathematical details for this algorithm can be found at the following citation [12]
the most important aspects of the DPGMM are the arguments that the user can
specify and tune The first of these tunable parameters is the value α which is the
same parameter as the α discussed in the previous paragraph As seen in Figure 21
on the right side properly tuning α is key to obtaining meaningful clusters The
13
center image has α set to 001 which is too small and results in all of the data being
formed under one cluster On the other hand the bottom-right image has the same
data set and α set to 100 which does a better job of clustering On a related note
the figure also demonstrates the effectiveness of the DPGMM over the GMM On the
left side clearly the dataset contains 2 clusters but the GMM on the top-left image
assumes 5 clusters as a prior and consequentially clusters the data incorrectly while
the DPGMM manages to limit the data to 2 clusters
The second argument that the user inputs for the DPGMM is the data that
will be clustered The scikit-learn implementation takes the data in the format
of a nested list (N lists each of length m) where N is the number of data points
and m the number of features While the format of the data structure is relatively
straightforward choosing which numbers should be in the data was a challenge I
faced Selecting the relevant features of each song to be used in the algorithm will
be expounded upon in the next section ldquoFeature Selectionrdquo
The last argument that a user inputs for the scikit-learn DGPMM implementa-
tion is an argument indicating the upper bound for the number of clusters The
Dirichlet Process then determines the best number of clusters for the data between
1 and the upper bound Since the DPGMM is flexible enough to find the best value
I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to
modify the number of clusters formed
22 Feature Selection
One of the most difficult aspects of the Dirichlet method is choosing the features
to be used for clustering In other words when we organize the songs into clusters
14
Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α
we need to ensure that each cluster is distinct in a way that is statistically and
intuitively logical In the Million Song Dataset [9] each song is represented as a
JSON object containing several fields These fields are candidate features to be used
in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo
by Rick Astley and the corresponding features
artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID
for this artists is db9)
artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)
artist_mbtags_count shape = (4)
(raw tag count of the 4 tags this artist received on musicbrainzorg)
artist_name Rick Astley (artist name)
artist_playmeid 1338 (the ID of that artist on the service playmecom)
artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)
artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest
(number between 0 and 1))
artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest
(number between 0 and 1))
15
audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for
the analysis by The Echo Nest)
bars_confidence shape = (99) (confidence value (between 0 and 1) associated
with each bar by The Echo Nest)
bars_start shape = (99) (start time of each bar according to The Echo Nest this
song has 99 bars)
beats_confidence shape = (397))
confidence value (between 0 and 1) associated with each beat by The Echo Nest
beats_start shape = (397) (start time of each beat according to The Echo Nest
this song has 397 beats)
danceability 00 (danceability measure of this song according to The Echo Nest
(between 0 and 1 0 =gt not analyzed))
duration 21169587 (duration of the track in seconds)
end_of_fade_in 0139 (time of the end of the fade in at the beginning of the
song according to The Echo Nest)
energy 00 (energy measure (not in the signal processing sense) according to The
Echo Nest (between 0 and 1 0 = not analyzed))
key 1 (estimation of the key the song is in by The Echo Nest)
key_confidence 0324 (confidence of the key estimation)
loudness -775 (general loudness of the track)
mode 1 (estimation of the mode the song is in by The Echo Nest)
mode_confidence 0434 (confidence of the mode estimation)
release Big Tunes - Back 2 The 80s (album name from which the track was taken
some songs tracks can come from many albums we give only one)
release_7digitalid 786795 (the ID of the release (album) on the service 7digi-
talcom)
sections_confidence shape = (10) (confidence value (between 0 and 1) associated
16
with each section by The Echo Nest)
sections_start shape = (10) (start time of each section according to The Echo
Nest this song has 10 sections)
segments_confidence shape = (935) (confidence value (between 0 and 1) asso-
ciated with each segment by The Echo Nest)
segments_loudness_max shape = (935) (max loudness during each segment)
segments_loudness_max_time shape = (935) (time of the max loudness
during each segment)
segments_loudness_start shape = (935) (loudness at the beginning of each
segment)
segments_pitches shape = (935 12) (chroma features for each segment (normal-
ized so max is 1))
segments_start shape = (935) (start time of each segment ( musical event or
onset) according to The Echo Nest this song has 935 segments)
segments_timbre shape = (935 12) (MFCC-like features for each segment)
similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar
to Rick Astley according to The Echo Nest)
song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded
(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))
song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can
be associated with many tracks (with very slight audio differences))
start_of _fade _out 198536 (start time of the fade out in seconds at the end
of the song according to The Echo Nest)
tatums_confidence shape = (794) (confidence value (between 0 and 1) associated
with each tatum by The Echo Nest)
tatums_start shape = (794) (start time of each tatum according to The Echo
Nest this song has 794 tatums)
17
tempo 113359 (tempo in BPM according to The Echo Nest)
time_signature 4 (time signature of the song according to The Echo Nest ie
usual number of beats per bar)
time_signature_confidence 0634 (confidence of the time signature estimation)
title Never Gonna Give You Up (song title)
track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)
track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track
on which the analysis was done) year 1987 (year when this song was released
according to musicbrainzorg)
When choosing features my main goal was to use features that would most
likely yield meaningful results yet also be simple and make sense to the average
person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener
will have his or her opinions to what constitutes different types of music but some
common features most people tend to differentiate songs by are pitch rhythm and
the types of instruments used The following specific fields provided in each song
object fall under these three terms
Pitch
bull segments_pitches a matrix of values indicating the strength of each pitch (or
note) at each discernible time interval
Rhythm
bull beats_start a vector of values indicating the start time of each beat
bull time_signature the time signature of the song
bull tempo the speed of the song in Beats Per Minute (BPM)
Instruments
18
bull segments_timbre a matrix of values indicating the distribution of MFCC-like
features (different types of tones) for each segments
The segments_pitches feature is a clear candidate for a differentiating factor for
songs since it reveals patterns of notes that occur Additionally other research
papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a
procedure that allows all songs to be compared with the same metric Likewise
timbre is intuitively a reliable differentiating feature since it reveals the amount
that different tones or sounds that sound different despite having the same pitch
Therefore segments_timbre is another feature that is considered in each song
Finally we look at the candidate features for rhythm At first glance all of these
features appear to be useful as they indicate the rhythm of a song in one way or
another However none of these features are as useful as the pitch and timbre
features While tempo is one factor in differentiating genres of EDM and music in
general tempo alone is not a driving force of musical innovation Certain genres
of EDM like drum nrsquo bass and happycore stand out for having very fast tempos
but the tempo is supplemented with a sound unique to the genre Conceiving new
arrangements of pitches combining instruments in new ways and inventing new
types of sounds are novel but speeding up or slowing down existing sounds is not
Including tempo as a feature could actually add noise to the model since many genres
overlap in their tempos And finally tempo is measured indirectly when the pitch
and timbre features are normalized for each song everything is measured in units of
ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features
each second Time signature can be dismissed from the candidates features for the
same reason as tempo many genres contain the same time signature and including
it in the feature set would only add more noise beats_start looks like a more
promising feature since like segments_pitches and segments_timbre it consists of
a vector of values However difficulties arise when we begin to think how exactly
19
we can utilize this information Since each song varies in length we need a way to
compare songs of different durations on the same level One approach could be to
perform basic statistics on the distance between each beat for example calculating
the mean and standard deviation of this distance However the normalized pitch
and timbre information already capture this data Another possibility is detecting
certain patterns of beats which could differentiate the syncopated dubstep or glitch
music beats from the steady pulse of electro-house But once again every beat is
accompanied by a sound with a specific timbre and pitch so this feature would not
add any significantly new information
23 Collecting Data and Preprocessing Selected Fea-
tures
231 Collecting the Data
Upon deciding the features I wanted to use in my research I first needed to collect
all of the electronic songs in the Million Song Dataset The easiest reliable way to
achieve this was to iterate through each song in the database and save the information
for the songs where any of the artist genre tags in artist_mbtags matched with an
electronic music genre While this measure was not fully accurate because it looks at
the genre of the artist not the song specific genre information for each song was not
as easily accessible so this indicator was nearly as good a substitute To generate a
list of the genres that electronic songs would fall under I manually searched through
a subset of the MSD to find all genres that seemed to be releated to electronic music
In the case of genres that were sometimes but not always electronic in nature such
20
as disco or pop I erred on the side of caution and did not include them in the list
of electronic genres In these cases false positives such as primarily rock songs that
happen to have the disco label attached to the artist could inadvertantly be included
in the dataset The final list of genres is as follows
target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo
rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo
rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo
rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
rsquodance and electronicarsquorsquoelectronicrsquo]
232 Pitch Preprocessing
A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-
cally informed manner The study first takes the raw sound data and converts it into
a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest
amount Then it computes the most likely chord by comparing comparing the 4 most
common types of chords in popular music (major minor dominant 7 and minor 7) to
the observed chord The most common chords are represented as ldquotemplate chordsrdquo
and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For
example using the note C as the first index the C major chord is represented as
CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)
For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is
computed over every template chord
ρCTc =12sumi=1
(CT minus CTi)(ci minus c)σCTσc
21
where CT is the mean of the values in the template chord σCT is the standard
deviation of the values in the chord and the operations on c are analogous Note
that the summation is over each individual pitch in the 12 pitch classes The chord
template with the highest value of ρ is selected as the chord for the time frame
After this is performed for each time frame the values are smoothed and then the
change between adjacent chords is observed The reasoning behind this step is that
by measuring the relative distance between chords rather than the chords themselves
all songs can be compared in the same manner even though they may have different
key signatures Finally the study takes the types of chord changes and classifies
them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted
versions of the chord changes that make more sense to a human such as ldquochanges
involving dominant 7th chordsrdquo
In my preliminary implementation of this method on an electronic dance music
corpus I made a few modifications to Mauchrsquos study First I smoothed out time
frames before computing the most probable chords rather than smoothing the most
probable chords I did this to save time and to reduce volatility in the chord
measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference
which contains 935 time frames and lasts 212 seconds 5 time frames is slightly
under 1 second and for preliminary testing appeared to be a good interval for each
time block Second as mentioned in the literature section I did not abstract the
chord changes into H-topics This decision also stemmed from time constraints since
deriving semantic chord meaning from EDM songs would require careful research
into the types of harmonies and sounds common in that genre of music Below I
included a high-level visualization of the pitch metadata found in a sample song
ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change
vector that I could then feed into the Dirichlet Process algorithm
22
Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813
CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513
DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913
FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713
GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213
AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513
13 13 13 13
060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013
13 13 13 13 13 13 13 13 13 13 13 13
13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13
Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13
Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13
Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13
Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13
23
13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13
13
13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13
Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13
24
A final step I took to normalize the chord change data was to divide the numbers by
the length of the song so that each songrsquos number of chord changes was measured per
second
233 Timbre Preprocessing
For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre
uniformly across all songs [8] After collecting all song metadata I took a random
sample of 20 songs from each year starting at 1970 The reason I forced the sampling
to 20 randomly sampled songs from each year and did not take a random sample of
songs from all years at once was to prevent bias towards any type of sounds As seen
in figure 22 there are significantly more songs from 2000-2011 than before 2000 The
mean year is x = 2001052 the median year is 2003 and the standard deviation of the
years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include
a disproportionate amount of more recent songs In order to not miss out on sounds
that may be more prevalent in older songs I required a set number of songs from each
year Next from each randomly selected song I selected 20 random timbre frames
in order to prevent any biases in data collection within each song In total there
were 422020 = 16800 timbre frames collected Next I clustered the timbre frames
using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to
100 and selecting the number of clusters with the lowest Bayes Information Criterion
(BIC) a statistical measure commonly used to calculate the best fitting model The
BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters
and saved the mean values of each of the 12 timbre segments for each cluster formed
In the same way that every song had the same 192 cord changes whose frequencies
could be compared between songs each song now had the same 46 timbre clusters
but different frequencies in each song When reading in the metadata from each song
I calculated the most likely timbre cluster each timbre frame belonged to and kept
25
Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear
a frequency count of all of the possible timbre clusters observed in a song Finally
as with the pitch data I divided all observed counts by the duration of the song in
order to normalize each songrsquos timbre counts
26
Chapter 3
Results
31 Methodology
After the pitch and timbre data was processed I ran the Dirichlet Process on the
data For each song I concatenated the 192-element chord change frequency list
and the 46-element timbre category frequency list giving each song a total of 238
features However there is a problem with this setup The pitch data will inherently
dominate the clustering process since it contains almost 3 times as many features
as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to
give different weights to each feature I considered another possibility to remedy
this discrepancy duplicating the timbre vector a certain number of times and
concatenating that to the feature set of each song While this strategy runs the risk
of corrupting the feature set and turning it into something that does not accurately
represent each song it is important to keep in mind that even without duplicating
the timbre vector the feature set consists of two separate feature sets concatenated
to each other Therefore timbre duplication appears to be a reasonable strategy to
weigh pitch and timbre more evenly
27
After this modification I tweaked a few more parameters before obtaining my
final results Dividing the pitch and timbre frequencies by the duration of the song
normalized every song to frequency per second but it also had the undesired effect
of making the data too small Timbre and pitch frequencies per second were almost
always less than 10 and many times hovered as low as 0002 for nonzero values
Because all of the values were very close to each other using common values of α
in the range of 01 to 1000-2000 was insufficient to push the songs into different
clusters As a result every song fell into the same cluster Increasing the value
of α by several orders of magnitude to well over 10 million fixed the problem but
this solution presented two problems First tuning α to experiment with different
ways to cluster the music would be problematic since I would have to work with
an enormous range of possible values for α Second pushing α to such high values
is not appropriate for the Dirichlet Process Extremely high values of α indicate a
Dirichlet Process that will try to disperse the data into different clusters but a value
of α that high is in principle always assigning each new song to a new cluster On
the other hand varying α between 01 and 1000 for example presents a much wider
range of flexibility when assigning clusters While this may be possible by varying
the values of α an extreme amount with the data as it currently is we are using
the Dirichlet Process in a way it should mathematically not be used Therefore
multiplying all of the data by a constant value so that we can work in the appropriate
range of α is the ideal approach After some experimentation I found that k=10 was
an appropriate scaling factor After initial runs of the Dirichlet Process I found out
that there was a slight issue with some of the earlier songs Since I had only artist
genre tags not specific song tags for each song I chose songs based on whether any
of the tags associated with the artist fell under any electronic music genre including
the generic term rsquoelectronicrsquo There were some bands mostly older ones from the
1960s and 1970s like Electric Light Orchestra which had some electronic music but
28
mostly featured rock funk disco or another genre Given that these artists featured
mostly non-electronic songs I decided to exclude them from my study and generate
a blacklist indicating these music artists While it was infeasible to look through
every single song and determine whether it was electronic or not I was able to look
over the earliest songs in each cluster These songs were the most important to verify
as electronic because early non-electronic songs could end up forming new clusters
and inadvertently create clusters with non-electronic sounds that I was not looking for
The goal of this thesis is to identify different groups in which EM songs are
clustered and identify the most unique artists and genres While the second task is
very simple because it requires looking at the earliest songs in each cluster the first
is difficult to gauge the effectiveness of While I can look at the average chord change
and timbre category frequencies in each category as well as other metadata putting
more semantic interpretations to what the music actually sounds like and determining
whether the music is clustered properly is a very subjective process For this reason
I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and
compared the clustering in each category examining similarities and differences in
the clusters formed in each scenario in the Discussion section For each value α I
set the upper limit of components or clusters allowed to 50 The ranges of α I used
resulted in 9 14 and 19 clusters formed
32 Findings
321 α=005
When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are
the distribution of years of the songs in each cluster (note that the Dirichlet Process
29
does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are
skipped)
30
Figure 31 Song year distributions for α = 005
For each value of α I also calculated the average frequency of each chord change
category and timbre category for each cluster and plotted the results The green
lines correspond to timbre and the blue lines to pitch
31
Figure 32 Timbre and pitch distributions for α = 005
A table of each cluster formed the number of songs in that cluster and descriptions
of pitch timbre and rhythmic qualities characteristic of songs in that cluster are
shown below
32
Cluster Song Count Characteristic Sounds
0 6481 Minimalist industrial space sounds dissonant chords
1 5482 Soft New Age ethereal
2 2405 Defined sounds electronic and non-electronic instru-
ments played in standard rock rhythms
3 360 Very dense and complex synths slightly darker tone
4 4550 Heavily distorted rock and synthesizer
6 2854 Faster paced 80s synth rock acid house
8 798 Aggressive beats dense house music
9 1464 Ambient house trancelike strong beats mysterious
tone
11 1597 Melancholy tones New wave rock in 80s then starting
in 90s downtempo trip-hop nu-metal
Table 31 Song cluster descriptions for α = 005
322 α=01
A total of 14 clusters were formed (16 were formed but 2 clusters contained only
one song each I listened to both of these songs and they did not sound unique so
I discarded them from the clusters) Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
33
34
Figure 33 Song year distributions for α = 01
35
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
hierarchy of music genres using hierarchical clustering Additionally the study takes
a crack at determining whether a particular band the Beatles was musically ground-
breaking for its time or merely playing off sounds that other bands had already used
Figure 12 Data processing pipeline for Mauchrsquos study illustrated with a segment ofQueenrsquos Bohemian Rhapsody 1975
While both Measuring the Evolution of Contemporary Western Popular Music
and Mauchrsquos study created abstractions of pitch and timbre Mauchrsquos study is more
appealing with respect to my goal because its end results align more closely with
mine Additionally the data processing pipeline offers several layers of abstraction
8
and depending on my progress I would be able to achieve at least one of the levels of
abstraction As shown in figure 12 each segment of a raw audio file is first broken
down into its 12 timbre MFCCs and pitch components Next the study constructs
ldquolexiconsrdquo or a dictionary of pitch and timbre terms that all songs can be compared
to For pitch the original data is in a N-by-12 matrix where N is the number of time
segments in the song and 12 the number of each of the notes found in an octave of
pitches Each time segment contains the relative strengths of each of the 12 pitches
However music sounds are not merely a collection of pitches but more precisely
chords Furthermore the similarity of two songs is not determined by the absolute
pitches of their chords but rather the progression of chords in the song all relative to
each other For example if all the notes in a song are transposed by one step the song
will sound different in terms of absolute pitch but the song will still be recognized
as the original because all of the relative movements from each chord to the next
are the same This phenomenon is captured in the pitch data by finding the most
likely chord played at each time segment then counting the change to the next chord
at each time step and generating a table of chord change frequencies for each song
Constructing the timbre lexcion is more complicated since there is no easy analogue
like chords for pitches to compare songs Mauchrsquos study utilizes a Gaussian Mixture
Model (GMM) by iterating over k=1 to k=N clusters where N is a large number
running the GMM on each prior assumption of k clusters and computing the Bayes
Information Criterion (BIC) for each model The lowest of the N BIC values is found
and that value of k is selected That model contains k different timbre clusters
and each cluster contains the mean timbre value for each of the 12 timbre components
For my research I decided that the pitch and timbre lexicons would be the most
realistic level of abstraction I could obtain Mauchrsquos study adds an addtional layer
to pitch and timbre by identifying the most common patterns of chord changes and
9
most common timbre rhythms and creating more general tags from these combined
terms such as ldquo stepwise changes indicating modal harmonyrdquo for a pitch topic and
ldquooh rounded mellowrdquo for a timbral topic There were two problems with using this
final layer of abstraction for my study First attaching semantic interpretations to
the pitch and timbral lexicons is a difficult task For timbre I would need to listen
to sound samples containing all of the different timbral categories I identified and
attaching user interpretations to them For the chords not only would I have to
perform the same analysis as on timbre but take careful attention to identify which
chords correspond to common sound progressions in popular music a task that I am
not qualified for an did not have the resources for this thesis to seek out Second
this final layer of abstraction was not necessary for the end goal of my paper In
fact consolidating my pitch and timbre lexicons into simpler phrases would run the
risk of pigeonholing my analysis and preventing me from discovering more nuanced
patterns in my final results Therefore I decided to focus on pitch and timbral
lexicon construction as the furthest levels of abstraction when processing songs for
my thesis Mathematical details on how I constructed the lexical and timbral lexicons
can be found in the Mathematical Modeling section of this paper
13 The Dataset
In order to successfully execute my thesis I need access to an extensive database of
music Until recently acquiring a substantial corpus of music data was a difficult and
costly task It is illegal to download music audio files from video and music-sharing
sites such as YouTube Spotify and Pandora Some platforms such as iTunes offer
90-second previews of songs but using only segments of songs and usually segments
that showcase the chorus of the song are not reliable measures to capture the entire
essence of a song Even if I were to legally download entire audio files for free I would
10
run into additional issues Obtaining a high-quality corpora of song data would be
challenging writing scripts that crawl music sharing platforms may not capture all of
the music I am looking for And once I have the audio files I would have to perform
audio processing techniques to extract the relevant information from the songs
Fortunately there is an easy solution to the music data acquisition problem
The Million Song Dataset (MSD) is a collection of metadata for one million music
tracks dating up to 2011 Various organizations such as The Echo Nest Musicbrainz
7digital and Lastfm have contributed different pieces of metadata Each song is
represented as a Hierarchical Data Format file (HDF5) which can be loaded as a
JSON object The fields encompass topical features such as the song title artist
and release date as well as lower-level features such as the loudness starting beat
time pitches and timbre of several segments of the song [9] While the MSD is
the largest free and open source music metadata dataset I could find there is no
guarantee that it adequately covers the entire spectrum of EM artists and songs
This quality limitation is important to consider throughout the study A quick look
through the songs including the subset of data I worked with for this report showed
that there were several well-known artists and songs in the EM scene Therefore
while the MSD may not contain all desired songs for this project it contains an
adequate number of relevant songs to produce some meaningful results Additionally
laying the groundwork for modeling the similarities between songs and identifying
groundbreaking ones is the same regardless of the songs included and the following
methodologies can be implemented on any similarly-formatted dataset including one
with songs that might currently be missing in the MSD
11
Chapter 2
Mathematical Modeling
21 Determining Novelty of Songs
Finding an logical and implementable mathematical model was and continues to be
an important aspect of my research My problem how to mathematically determine
which songs were unique for their time requires an algorithm in which each song is
introduced in chronological order either joining an existing category or starting a
new category based on its musical similarity to songs already introduced Clustering
algorithms like k-means or Gaussian Mixture Models (GMM) which have a prede-
termined number of clusters and optimize the partitioning of a dataset into those
cluster assume a fixed number of clusters While this process would work if we knew
exactly how many genres of EM existed if we guess wrong our end results may end
up with clusters that are wrongly grouped together or separated It is much better to
apply a clustering algorithm that does not make any assumptions about this number
One particularly promising process that addresses the issue of tthe number of
clusters is a family of algorithms known as Dirichlet Proccesses (DPs) DPs are
useful for this particular application because (1) they assign clusters to a dataset
12
with only an upper bound on the number of clusters and (2) by sorting the songs
in chronological order before running the algorithm and keeping track of which
songs are categorized under each cluster we can observe the earliest songs in each
cluster and consequentially infer which songs were responsible for creating new
clusters The arguments for the DP The DP is controlled by a parameter α which
is the concentration parameter The expected number of clusters formed is directly
proportional to the value of α so the higher the value of α the more likely new
clusters will be formed [10] Regardless of the value of α as the number of data
points introduced increases the probability of a new group being formed decreases
That is a ldquorich get richerrdquo policy is in place and existing clusters tend to grow in
size Tweaking the value of the tunable parameter α is an important part of the
study since it determines the flexibility given to forming a new cluster If the value
of α is too small then the criteria for forming clusters will be too strict and data
that should be in different clusters will be assigned to the same cluster On the other
hand if α is too large the algorithm will be too sensitive and assign similar songs to
different clusters
The implementation of the DP was achieved using scikit-learnrsquos library and API for
Dirichlet Process Gaussian Mixture Model (DPGMM) The DPGMM is the formal
name of the Dirichlet Process model used to cluster the data More specifically
scikit-learnrsquos implementation of the DPGMM uses the Stick Breaking method
one of several equally valid methods to assign songs to clusters [11] While the
mathematical details for this algorithm can be found at the following citation [12]
the most important aspects of the DPGMM are the arguments that the user can
specify and tune The first of these tunable parameters is the value α which is the
same parameter as the α discussed in the previous paragraph As seen in Figure 21
on the right side properly tuning α is key to obtaining meaningful clusters The
13
center image has α set to 001 which is too small and results in all of the data being
formed under one cluster On the other hand the bottom-right image has the same
data set and α set to 100 which does a better job of clustering On a related note
the figure also demonstrates the effectiveness of the DPGMM over the GMM On the
left side clearly the dataset contains 2 clusters but the GMM on the top-left image
assumes 5 clusters as a prior and consequentially clusters the data incorrectly while
the DPGMM manages to limit the data to 2 clusters
The second argument that the user inputs for the DPGMM is the data that
will be clustered The scikit-learn implementation takes the data in the format
of a nested list (N lists each of length m) where N is the number of data points
and m the number of features While the format of the data structure is relatively
straightforward choosing which numbers should be in the data was a challenge I
faced Selecting the relevant features of each song to be used in the algorithm will
be expounded upon in the next section ldquoFeature Selectionrdquo
The last argument that a user inputs for the scikit-learn DGPMM implementa-
tion is an argument indicating the upper bound for the number of clusters The
Dirichlet Process then determines the best number of clusters for the data between
1 and the upper bound Since the DPGMM is flexible enough to find the best value
I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to
modify the number of clusters formed
22 Feature Selection
One of the most difficult aspects of the Dirichlet method is choosing the features
to be used for clustering In other words when we organize the songs into clusters
14
Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α
we need to ensure that each cluster is distinct in a way that is statistically and
intuitively logical In the Million Song Dataset [9] each song is represented as a
JSON object containing several fields These fields are candidate features to be used
in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo
by Rick Astley and the corresponding features
artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID
for this artists is db9)
artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)
artist_mbtags_count shape = (4)
(raw tag count of the 4 tags this artist received on musicbrainzorg)
artist_name Rick Astley (artist name)
artist_playmeid 1338 (the ID of that artist on the service playmecom)
artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)
artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest
(number between 0 and 1))
artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest
(number between 0 and 1))
15
audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for
the analysis by The Echo Nest)
bars_confidence shape = (99) (confidence value (between 0 and 1) associated
with each bar by The Echo Nest)
bars_start shape = (99) (start time of each bar according to The Echo Nest this
song has 99 bars)
beats_confidence shape = (397))
confidence value (between 0 and 1) associated with each beat by The Echo Nest
beats_start shape = (397) (start time of each beat according to The Echo Nest
this song has 397 beats)
danceability 00 (danceability measure of this song according to The Echo Nest
(between 0 and 1 0 =gt not analyzed))
duration 21169587 (duration of the track in seconds)
end_of_fade_in 0139 (time of the end of the fade in at the beginning of the
song according to The Echo Nest)
energy 00 (energy measure (not in the signal processing sense) according to The
Echo Nest (between 0 and 1 0 = not analyzed))
key 1 (estimation of the key the song is in by The Echo Nest)
key_confidence 0324 (confidence of the key estimation)
loudness -775 (general loudness of the track)
mode 1 (estimation of the mode the song is in by The Echo Nest)
mode_confidence 0434 (confidence of the mode estimation)
release Big Tunes - Back 2 The 80s (album name from which the track was taken
some songs tracks can come from many albums we give only one)
release_7digitalid 786795 (the ID of the release (album) on the service 7digi-
talcom)
sections_confidence shape = (10) (confidence value (between 0 and 1) associated
16
with each section by The Echo Nest)
sections_start shape = (10) (start time of each section according to The Echo
Nest this song has 10 sections)
segments_confidence shape = (935) (confidence value (between 0 and 1) asso-
ciated with each segment by The Echo Nest)
segments_loudness_max shape = (935) (max loudness during each segment)
segments_loudness_max_time shape = (935) (time of the max loudness
during each segment)
segments_loudness_start shape = (935) (loudness at the beginning of each
segment)
segments_pitches shape = (935 12) (chroma features for each segment (normal-
ized so max is 1))
segments_start shape = (935) (start time of each segment ( musical event or
onset) according to The Echo Nest this song has 935 segments)
segments_timbre shape = (935 12) (MFCC-like features for each segment)
similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar
to Rick Astley according to The Echo Nest)
song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded
(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))
song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can
be associated with many tracks (with very slight audio differences))
start_of _fade _out 198536 (start time of the fade out in seconds at the end
of the song according to The Echo Nest)
tatums_confidence shape = (794) (confidence value (between 0 and 1) associated
with each tatum by The Echo Nest)
tatums_start shape = (794) (start time of each tatum according to The Echo
Nest this song has 794 tatums)
17
tempo 113359 (tempo in BPM according to The Echo Nest)
time_signature 4 (time signature of the song according to The Echo Nest ie
usual number of beats per bar)
time_signature_confidence 0634 (confidence of the time signature estimation)
title Never Gonna Give You Up (song title)
track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)
track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track
on which the analysis was done) year 1987 (year when this song was released
according to musicbrainzorg)
When choosing features my main goal was to use features that would most
likely yield meaningful results yet also be simple and make sense to the average
person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener
will have his or her opinions to what constitutes different types of music but some
common features most people tend to differentiate songs by are pitch rhythm and
the types of instruments used The following specific fields provided in each song
object fall under these three terms
Pitch
bull segments_pitches a matrix of values indicating the strength of each pitch (or
note) at each discernible time interval
Rhythm
bull beats_start a vector of values indicating the start time of each beat
bull time_signature the time signature of the song
bull tempo the speed of the song in Beats Per Minute (BPM)
Instruments
18
bull segments_timbre a matrix of values indicating the distribution of MFCC-like
features (different types of tones) for each segments
The segments_pitches feature is a clear candidate for a differentiating factor for
songs since it reveals patterns of notes that occur Additionally other research
papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a
procedure that allows all songs to be compared with the same metric Likewise
timbre is intuitively a reliable differentiating feature since it reveals the amount
that different tones or sounds that sound different despite having the same pitch
Therefore segments_timbre is another feature that is considered in each song
Finally we look at the candidate features for rhythm At first glance all of these
features appear to be useful as they indicate the rhythm of a song in one way or
another However none of these features are as useful as the pitch and timbre
features While tempo is one factor in differentiating genres of EDM and music in
general tempo alone is not a driving force of musical innovation Certain genres
of EDM like drum nrsquo bass and happycore stand out for having very fast tempos
but the tempo is supplemented with a sound unique to the genre Conceiving new
arrangements of pitches combining instruments in new ways and inventing new
types of sounds are novel but speeding up or slowing down existing sounds is not
Including tempo as a feature could actually add noise to the model since many genres
overlap in their tempos And finally tempo is measured indirectly when the pitch
and timbre features are normalized for each song everything is measured in units of
ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features
each second Time signature can be dismissed from the candidates features for the
same reason as tempo many genres contain the same time signature and including
it in the feature set would only add more noise beats_start looks like a more
promising feature since like segments_pitches and segments_timbre it consists of
a vector of values However difficulties arise when we begin to think how exactly
19
we can utilize this information Since each song varies in length we need a way to
compare songs of different durations on the same level One approach could be to
perform basic statistics on the distance between each beat for example calculating
the mean and standard deviation of this distance However the normalized pitch
and timbre information already capture this data Another possibility is detecting
certain patterns of beats which could differentiate the syncopated dubstep or glitch
music beats from the steady pulse of electro-house But once again every beat is
accompanied by a sound with a specific timbre and pitch so this feature would not
add any significantly new information
23 Collecting Data and Preprocessing Selected Fea-
tures
231 Collecting the Data
Upon deciding the features I wanted to use in my research I first needed to collect
all of the electronic songs in the Million Song Dataset The easiest reliable way to
achieve this was to iterate through each song in the database and save the information
for the songs where any of the artist genre tags in artist_mbtags matched with an
electronic music genre While this measure was not fully accurate because it looks at
the genre of the artist not the song specific genre information for each song was not
as easily accessible so this indicator was nearly as good a substitute To generate a
list of the genres that electronic songs would fall under I manually searched through
a subset of the MSD to find all genres that seemed to be releated to electronic music
In the case of genres that were sometimes but not always electronic in nature such
20
as disco or pop I erred on the side of caution and did not include them in the list
of electronic genres In these cases false positives such as primarily rock songs that
happen to have the disco label attached to the artist could inadvertantly be included
in the dataset The final list of genres is as follows
target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo
rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo
rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo
rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
rsquodance and electronicarsquorsquoelectronicrsquo]
232 Pitch Preprocessing
A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-
cally informed manner The study first takes the raw sound data and converts it into
a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest
amount Then it computes the most likely chord by comparing comparing the 4 most
common types of chords in popular music (major minor dominant 7 and minor 7) to
the observed chord The most common chords are represented as ldquotemplate chordsrdquo
and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For
example using the note C as the first index the C major chord is represented as
CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)
For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is
computed over every template chord
ρCTc =12sumi=1
(CT minus CTi)(ci minus c)σCTσc
21
where CT is the mean of the values in the template chord σCT is the standard
deviation of the values in the chord and the operations on c are analogous Note
that the summation is over each individual pitch in the 12 pitch classes The chord
template with the highest value of ρ is selected as the chord for the time frame
After this is performed for each time frame the values are smoothed and then the
change between adjacent chords is observed The reasoning behind this step is that
by measuring the relative distance between chords rather than the chords themselves
all songs can be compared in the same manner even though they may have different
key signatures Finally the study takes the types of chord changes and classifies
them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted
versions of the chord changes that make more sense to a human such as ldquochanges
involving dominant 7th chordsrdquo
In my preliminary implementation of this method on an electronic dance music
corpus I made a few modifications to Mauchrsquos study First I smoothed out time
frames before computing the most probable chords rather than smoothing the most
probable chords I did this to save time and to reduce volatility in the chord
measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference
which contains 935 time frames and lasts 212 seconds 5 time frames is slightly
under 1 second and for preliminary testing appeared to be a good interval for each
time block Second as mentioned in the literature section I did not abstract the
chord changes into H-topics This decision also stemmed from time constraints since
deriving semantic chord meaning from EDM songs would require careful research
into the types of harmonies and sounds common in that genre of music Below I
included a high-level visualization of the pitch metadata found in a sample song
ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change
vector that I could then feed into the Dirichlet Process algorithm
22
Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813
CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513
DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913
FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713
GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213
AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513
13 13 13 13
060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013
13 13 13 13 13 13 13 13 13 13 13 13
13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13
Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13
Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13
Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13
Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13
23
13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13
13
13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13
Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13
24
A final step I took to normalize the chord change data was to divide the numbers by
the length of the song so that each songrsquos number of chord changes was measured per
second
233 Timbre Preprocessing
For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre
uniformly across all songs [8] After collecting all song metadata I took a random
sample of 20 songs from each year starting at 1970 The reason I forced the sampling
to 20 randomly sampled songs from each year and did not take a random sample of
songs from all years at once was to prevent bias towards any type of sounds As seen
in figure 22 there are significantly more songs from 2000-2011 than before 2000 The
mean year is x = 2001052 the median year is 2003 and the standard deviation of the
years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include
a disproportionate amount of more recent songs In order to not miss out on sounds
that may be more prevalent in older songs I required a set number of songs from each
year Next from each randomly selected song I selected 20 random timbre frames
in order to prevent any biases in data collection within each song In total there
were 422020 = 16800 timbre frames collected Next I clustered the timbre frames
using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to
100 and selecting the number of clusters with the lowest Bayes Information Criterion
(BIC) a statistical measure commonly used to calculate the best fitting model The
BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters
and saved the mean values of each of the 12 timbre segments for each cluster formed
In the same way that every song had the same 192 cord changes whose frequencies
could be compared between songs each song now had the same 46 timbre clusters
but different frequencies in each song When reading in the metadata from each song
I calculated the most likely timbre cluster each timbre frame belonged to and kept
25
Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear
a frequency count of all of the possible timbre clusters observed in a song Finally
as with the pitch data I divided all observed counts by the duration of the song in
order to normalize each songrsquos timbre counts
26
Chapter 3
Results
31 Methodology
After the pitch and timbre data was processed I ran the Dirichlet Process on the
data For each song I concatenated the 192-element chord change frequency list
and the 46-element timbre category frequency list giving each song a total of 238
features However there is a problem with this setup The pitch data will inherently
dominate the clustering process since it contains almost 3 times as many features
as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to
give different weights to each feature I considered another possibility to remedy
this discrepancy duplicating the timbre vector a certain number of times and
concatenating that to the feature set of each song While this strategy runs the risk
of corrupting the feature set and turning it into something that does not accurately
represent each song it is important to keep in mind that even without duplicating
the timbre vector the feature set consists of two separate feature sets concatenated
to each other Therefore timbre duplication appears to be a reasonable strategy to
weigh pitch and timbre more evenly
27
After this modification I tweaked a few more parameters before obtaining my
final results Dividing the pitch and timbre frequencies by the duration of the song
normalized every song to frequency per second but it also had the undesired effect
of making the data too small Timbre and pitch frequencies per second were almost
always less than 10 and many times hovered as low as 0002 for nonzero values
Because all of the values were very close to each other using common values of α
in the range of 01 to 1000-2000 was insufficient to push the songs into different
clusters As a result every song fell into the same cluster Increasing the value
of α by several orders of magnitude to well over 10 million fixed the problem but
this solution presented two problems First tuning α to experiment with different
ways to cluster the music would be problematic since I would have to work with
an enormous range of possible values for α Second pushing α to such high values
is not appropriate for the Dirichlet Process Extremely high values of α indicate a
Dirichlet Process that will try to disperse the data into different clusters but a value
of α that high is in principle always assigning each new song to a new cluster On
the other hand varying α between 01 and 1000 for example presents a much wider
range of flexibility when assigning clusters While this may be possible by varying
the values of α an extreme amount with the data as it currently is we are using
the Dirichlet Process in a way it should mathematically not be used Therefore
multiplying all of the data by a constant value so that we can work in the appropriate
range of α is the ideal approach After some experimentation I found that k=10 was
an appropriate scaling factor After initial runs of the Dirichlet Process I found out
that there was a slight issue with some of the earlier songs Since I had only artist
genre tags not specific song tags for each song I chose songs based on whether any
of the tags associated with the artist fell under any electronic music genre including
the generic term rsquoelectronicrsquo There were some bands mostly older ones from the
1960s and 1970s like Electric Light Orchestra which had some electronic music but
28
mostly featured rock funk disco or another genre Given that these artists featured
mostly non-electronic songs I decided to exclude them from my study and generate
a blacklist indicating these music artists While it was infeasible to look through
every single song and determine whether it was electronic or not I was able to look
over the earliest songs in each cluster These songs were the most important to verify
as electronic because early non-electronic songs could end up forming new clusters
and inadvertently create clusters with non-electronic sounds that I was not looking for
The goal of this thesis is to identify different groups in which EM songs are
clustered and identify the most unique artists and genres While the second task is
very simple because it requires looking at the earliest songs in each cluster the first
is difficult to gauge the effectiveness of While I can look at the average chord change
and timbre category frequencies in each category as well as other metadata putting
more semantic interpretations to what the music actually sounds like and determining
whether the music is clustered properly is a very subjective process For this reason
I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and
compared the clustering in each category examining similarities and differences in
the clusters formed in each scenario in the Discussion section For each value α I
set the upper limit of components or clusters allowed to 50 The ranges of α I used
resulted in 9 14 and 19 clusters formed
32 Findings
321 α=005
When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are
the distribution of years of the songs in each cluster (note that the Dirichlet Process
29
does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are
skipped)
30
Figure 31 Song year distributions for α = 005
For each value of α I also calculated the average frequency of each chord change
category and timbre category for each cluster and plotted the results The green
lines correspond to timbre and the blue lines to pitch
31
Figure 32 Timbre and pitch distributions for α = 005
A table of each cluster formed the number of songs in that cluster and descriptions
of pitch timbre and rhythmic qualities characteristic of songs in that cluster are
shown below
32
Cluster Song Count Characteristic Sounds
0 6481 Minimalist industrial space sounds dissonant chords
1 5482 Soft New Age ethereal
2 2405 Defined sounds electronic and non-electronic instru-
ments played in standard rock rhythms
3 360 Very dense and complex synths slightly darker tone
4 4550 Heavily distorted rock and synthesizer
6 2854 Faster paced 80s synth rock acid house
8 798 Aggressive beats dense house music
9 1464 Ambient house trancelike strong beats mysterious
tone
11 1597 Melancholy tones New wave rock in 80s then starting
in 90s downtempo trip-hop nu-metal
Table 31 Song cluster descriptions for α = 005
322 α=01
A total of 14 clusters were formed (16 were formed but 2 clusters contained only
one song each I listened to both of these songs and they did not sound unique so
I discarded them from the clusters) Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
33
34
Figure 33 Song year distributions for α = 01
35
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
and depending on my progress I would be able to achieve at least one of the levels of
abstraction As shown in figure 12 each segment of a raw audio file is first broken
down into its 12 timbre MFCCs and pitch components Next the study constructs
ldquolexiconsrdquo or a dictionary of pitch and timbre terms that all songs can be compared
to For pitch the original data is in a N-by-12 matrix where N is the number of time
segments in the song and 12 the number of each of the notes found in an octave of
pitches Each time segment contains the relative strengths of each of the 12 pitches
However music sounds are not merely a collection of pitches but more precisely
chords Furthermore the similarity of two songs is not determined by the absolute
pitches of their chords but rather the progression of chords in the song all relative to
each other For example if all the notes in a song are transposed by one step the song
will sound different in terms of absolute pitch but the song will still be recognized
as the original because all of the relative movements from each chord to the next
are the same This phenomenon is captured in the pitch data by finding the most
likely chord played at each time segment then counting the change to the next chord
at each time step and generating a table of chord change frequencies for each song
Constructing the timbre lexcion is more complicated since there is no easy analogue
like chords for pitches to compare songs Mauchrsquos study utilizes a Gaussian Mixture
Model (GMM) by iterating over k=1 to k=N clusters where N is a large number
running the GMM on each prior assumption of k clusters and computing the Bayes
Information Criterion (BIC) for each model The lowest of the N BIC values is found
and that value of k is selected That model contains k different timbre clusters
and each cluster contains the mean timbre value for each of the 12 timbre components
For my research I decided that the pitch and timbre lexicons would be the most
realistic level of abstraction I could obtain Mauchrsquos study adds an addtional layer
to pitch and timbre by identifying the most common patterns of chord changes and
9
most common timbre rhythms and creating more general tags from these combined
terms such as ldquo stepwise changes indicating modal harmonyrdquo for a pitch topic and
ldquooh rounded mellowrdquo for a timbral topic There were two problems with using this
final layer of abstraction for my study First attaching semantic interpretations to
the pitch and timbral lexicons is a difficult task For timbre I would need to listen
to sound samples containing all of the different timbral categories I identified and
attaching user interpretations to them For the chords not only would I have to
perform the same analysis as on timbre but take careful attention to identify which
chords correspond to common sound progressions in popular music a task that I am
not qualified for an did not have the resources for this thesis to seek out Second
this final layer of abstraction was not necessary for the end goal of my paper In
fact consolidating my pitch and timbre lexicons into simpler phrases would run the
risk of pigeonholing my analysis and preventing me from discovering more nuanced
patterns in my final results Therefore I decided to focus on pitch and timbral
lexicon construction as the furthest levels of abstraction when processing songs for
my thesis Mathematical details on how I constructed the lexical and timbral lexicons
can be found in the Mathematical Modeling section of this paper
13 The Dataset
In order to successfully execute my thesis I need access to an extensive database of
music Until recently acquiring a substantial corpus of music data was a difficult and
costly task It is illegal to download music audio files from video and music-sharing
sites such as YouTube Spotify and Pandora Some platforms such as iTunes offer
90-second previews of songs but using only segments of songs and usually segments
that showcase the chorus of the song are not reliable measures to capture the entire
essence of a song Even if I were to legally download entire audio files for free I would
10
run into additional issues Obtaining a high-quality corpora of song data would be
challenging writing scripts that crawl music sharing platforms may not capture all of
the music I am looking for And once I have the audio files I would have to perform
audio processing techniques to extract the relevant information from the songs
Fortunately there is an easy solution to the music data acquisition problem
The Million Song Dataset (MSD) is a collection of metadata for one million music
tracks dating up to 2011 Various organizations such as The Echo Nest Musicbrainz
7digital and Lastfm have contributed different pieces of metadata Each song is
represented as a Hierarchical Data Format file (HDF5) which can be loaded as a
JSON object The fields encompass topical features such as the song title artist
and release date as well as lower-level features such as the loudness starting beat
time pitches and timbre of several segments of the song [9] While the MSD is
the largest free and open source music metadata dataset I could find there is no
guarantee that it adequately covers the entire spectrum of EM artists and songs
This quality limitation is important to consider throughout the study A quick look
through the songs including the subset of data I worked with for this report showed
that there were several well-known artists and songs in the EM scene Therefore
while the MSD may not contain all desired songs for this project it contains an
adequate number of relevant songs to produce some meaningful results Additionally
laying the groundwork for modeling the similarities between songs and identifying
groundbreaking ones is the same regardless of the songs included and the following
methodologies can be implemented on any similarly-formatted dataset including one
with songs that might currently be missing in the MSD
11
Chapter 2
Mathematical Modeling
21 Determining Novelty of Songs
Finding an logical and implementable mathematical model was and continues to be
an important aspect of my research My problem how to mathematically determine
which songs were unique for their time requires an algorithm in which each song is
introduced in chronological order either joining an existing category or starting a
new category based on its musical similarity to songs already introduced Clustering
algorithms like k-means or Gaussian Mixture Models (GMM) which have a prede-
termined number of clusters and optimize the partitioning of a dataset into those
cluster assume a fixed number of clusters While this process would work if we knew
exactly how many genres of EM existed if we guess wrong our end results may end
up with clusters that are wrongly grouped together or separated It is much better to
apply a clustering algorithm that does not make any assumptions about this number
One particularly promising process that addresses the issue of tthe number of
clusters is a family of algorithms known as Dirichlet Proccesses (DPs) DPs are
useful for this particular application because (1) they assign clusters to a dataset
12
with only an upper bound on the number of clusters and (2) by sorting the songs
in chronological order before running the algorithm and keeping track of which
songs are categorized under each cluster we can observe the earliest songs in each
cluster and consequentially infer which songs were responsible for creating new
clusters The arguments for the DP The DP is controlled by a parameter α which
is the concentration parameter The expected number of clusters formed is directly
proportional to the value of α so the higher the value of α the more likely new
clusters will be formed [10] Regardless of the value of α as the number of data
points introduced increases the probability of a new group being formed decreases
That is a ldquorich get richerrdquo policy is in place and existing clusters tend to grow in
size Tweaking the value of the tunable parameter α is an important part of the
study since it determines the flexibility given to forming a new cluster If the value
of α is too small then the criteria for forming clusters will be too strict and data
that should be in different clusters will be assigned to the same cluster On the other
hand if α is too large the algorithm will be too sensitive and assign similar songs to
different clusters
The implementation of the DP was achieved using scikit-learnrsquos library and API for
Dirichlet Process Gaussian Mixture Model (DPGMM) The DPGMM is the formal
name of the Dirichlet Process model used to cluster the data More specifically
scikit-learnrsquos implementation of the DPGMM uses the Stick Breaking method
one of several equally valid methods to assign songs to clusters [11] While the
mathematical details for this algorithm can be found at the following citation [12]
the most important aspects of the DPGMM are the arguments that the user can
specify and tune The first of these tunable parameters is the value α which is the
same parameter as the α discussed in the previous paragraph As seen in Figure 21
on the right side properly tuning α is key to obtaining meaningful clusters The
13
center image has α set to 001 which is too small and results in all of the data being
formed under one cluster On the other hand the bottom-right image has the same
data set and α set to 100 which does a better job of clustering On a related note
the figure also demonstrates the effectiveness of the DPGMM over the GMM On the
left side clearly the dataset contains 2 clusters but the GMM on the top-left image
assumes 5 clusters as a prior and consequentially clusters the data incorrectly while
the DPGMM manages to limit the data to 2 clusters
The second argument that the user inputs for the DPGMM is the data that
will be clustered The scikit-learn implementation takes the data in the format
of a nested list (N lists each of length m) where N is the number of data points
and m the number of features While the format of the data structure is relatively
straightforward choosing which numbers should be in the data was a challenge I
faced Selecting the relevant features of each song to be used in the algorithm will
be expounded upon in the next section ldquoFeature Selectionrdquo
The last argument that a user inputs for the scikit-learn DGPMM implementa-
tion is an argument indicating the upper bound for the number of clusters The
Dirichlet Process then determines the best number of clusters for the data between
1 and the upper bound Since the DPGMM is flexible enough to find the best value
I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to
modify the number of clusters formed
22 Feature Selection
One of the most difficult aspects of the Dirichlet method is choosing the features
to be used for clustering In other words when we organize the songs into clusters
14
Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α
we need to ensure that each cluster is distinct in a way that is statistically and
intuitively logical In the Million Song Dataset [9] each song is represented as a
JSON object containing several fields These fields are candidate features to be used
in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo
by Rick Astley and the corresponding features
artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID
for this artists is db9)
artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)
artist_mbtags_count shape = (4)
(raw tag count of the 4 tags this artist received on musicbrainzorg)
artist_name Rick Astley (artist name)
artist_playmeid 1338 (the ID of that artist on the service playmecom)
artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)
artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest
(number between 0 and 1))
artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest
(number between 0 and 1))
15
audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for
the analysis by The Echo Nest)
bars_confidence shape = (99) (confidence value (between 0 and 1) associated
with each bar by The Echo Nest)
bars_start shape = (99) (start time of each bar according to The Echo Nest this
song has 99 bars)
beats_confidence shape = (397))
confidence value (between 0 and 1) associated with each beat by The Echo Nest
beats_start shape = (397) (start time of each beat according to The Echo Nest
this song has 397 beats)
danceability 00 (danceability measure of this song according to The Echo Nest
(between 0 and 1 0 =gt not analyzed))
duration 21169587 (duration of the track in seconds)
end_of_fade_in 0139 (time of the end of the fade in at the beginning of the
song according to The Echo Nest)
energy 00 (energy measure (not in the signal processing sense) according to The
Echo Nest (between 0 and 1 0 = not analyzed))
key 1 (estimation of the key the song is in by The Echo Nest)
key_confidence 0324 (confidence of the key estimation)
loudness -775 (general loudness of the track)
mode 1 (estimation of the mode the song is in by The Echo Nest)
mode_confidence 0434 (confidence of the mode estimation)
release Big Tunes - Back 2 The 80s (album name from which the track was taken
some songs tracks can come from many albums we give only one)
release_7digitalid 786795 (the ID of the release (album) on the service 7digi-
talcom)
sections_confidence shape = (10) (confidence value (between 0 and 1) associated
16
with each section by The Echo Nest)
sections_start shape = (10) (start time of each section according to The Echo
Nest this song has 10 sections)
segments_confidence shape = (935) (confidence value (between 0 and 1) asso-
ciated with each segment by The Echo Nest)
segments_loudness_max shape = (935) (max loudness during each segment)
segments_loudness_max_time shape = (935) (time of the max loudness
during each segment)
segments_loudness_start shape = (935) (loudness at the beginning of each
segment)
segments_pitches shape = (935 12) (chroma features for each segment (normal-
ized so max is 1))
segments_start shape = (935) (start time of each segment ( musical event or
onset) according to The Echo Nest this song has 935 segments)
segments_timbre shape = (935 12) (MFCC-like features for each segment)
similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar
to Rick Astley according to The Echo Nest)
song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded
(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))
song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can
be associated with many tracks (with very slight audio differences))
start_of _fade _out 198536 (start time of the fade out in seconds at the end
of the song according to The Echo Nest)
tatums_confidence shape = (794) (confidence value (between 0 and 1) associated
with each tatum by The Echo Nest)
tatums_start shape = (794) (start time of each tatum according to The Echo
Nest this song has 794 tatums)
17
tempo 113359 (tempo in BPM according to The Echo Nest)
time_signature 4 (time signature of the song according to The Echo Nest ie
usual number of beats per bar)
time_signature_confidence 0634 (confidence of the time signature estimation)
title Never Gonna Give You Up (song title)
track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)
track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track
on which the analysis was done) year 1987 (year when this song was released
according to musicbrainzorg)
When choosing features my main goal was to use features that would most
likely yield meaningful results yet also be simple and make sense to the average
person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener
will have his or her opinions to what constitutes different types of music but some
common features most people tend to differentiate songs by are pitch rhythm and
the types of instruments used The following specific fields provided in each song
object fall under these three terms
Pitch
bull segments_pitches a matrix of values indicating the strength of each pitch (or
note) at each discernible time interval
Rhythm
bull beats_start a vector of values indicating the start time of each beat
bull time_signature the time signature of the song
bull tempo the speed of the song in Beats Per Minute (BPM)
Instruments
18
bull segments_timbre a matrix of values indicating the distribution of MFCC-like
features (different types of tones) for each segments
The segments_pitches feature is a clear candidate for a differentiating factor for
songs since it reveals patterns of notes that occur Additionally other research
papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a
procedure that allows all songs to be compared with the same metric Likewise
timbre is intuitively a reliable differentiating feature since it reveals the amount
that different tones or sounds that sound different despite having the same pitch
Therefore segments_timbre is another feature that is considered in each song
Finally we look at the candidate features for rhythm At first glance all of these
features appear to be useful as they indicate the rhythm of a song in one way or
another However none of these features are as useful as the pitch and timbre
features While tempo is one factor in differentiating genres of EDM and music in
general tempo alone is not a driving force of musical innovation Certain genres
of EDM like drum nrsquo bass and happycore stand out for having very fast tempos
but the tempo is supplemented with a sound unique to the genre Conceiving new
arrangements of pitches combining instruments in new ways and inventing new
types of sounds are novel but speeding up or slowing down existing sounds is not
Including tempo as a feature could actually add noise to the model since many genres
overlap in their tempos And finally tempo is measured indirectly when the pitch
and timbre features are normalized for each song everything is measured in units of
ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features
each second Time signature can be dismissed from the candidates features for the
same reason as tempo many genres contain the same time signature and including
it in the feature set would only add more noise beats_start looks like a more
promising feature since like segments_pitches and segments_timbre it consists of
a vector of values However difficulties arise when we begin to think how exactly
19
we can utilize this information Since each song varies in length we need a way to
compare songs of different durations on the same level One approach could be to
perform basic statistics on the distance between each beat for example calculating
the mean and standard deviation of this distance However the normalized pitch
and timbre information already capture this data Another possibility is detecting
certain patterns of beats which could differentiate the syncopated dubstep or glitch
music beats from the steady pulse of electro-house But once again every beat is
accompanied by a sound with a specific timbre and pitch so this feature would not
add any significantly new information
23 Collecting Data and Preprocessing Selected Fea-
tures
231 Collecting the Data
Upon deciding the features I wanted to use in my research I first needed to collect
all of the electronic songs in the Million Song Dataset The easiest reliable way to
achieve this was to iterate through each song in the database and save the information
for the songs where any of the artist genre tags in artist_mbtags matched with an
electronic music genre While this measure was not fully accurate because it looks at
the genre of the artist not the song specific genre information for each song was not
as easily accessible so this indicator was nearly as good a substitute To generate a
list of the genres that electronic songs would fall under I manually searched through
a subset of the MSD to find all genres that seemed to be releated to electronic music
In the case of genres that were sometimes but not always electronic in nature such
20
as disco or pop I erred on the side of caution and did not include them in the list
of electronic genres In these cases false positives such as primarily rock songs that
happen to have the disco label attached to the artist could inadvertantly be included
in the dataset The final list of genres is as follows
target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo
rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo
rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo
rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
rsquodance and electronicarsquorsquoelectronicrsquo]
232 Pitch Preprocessing
A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-
cally informed manner The study first takes the raw sound data and converts it into
a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest
amount Then it computes the most likely chord by comparing comparing the 4 most
common types of chords in popular music (major minor dominant 7 and minor 7) to
the observed chord The most common chords are represented as ldquotemplate chordsrdquo
and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For
example using the note C as the first index the C major chord is represented as
CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)
For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is
computed over every template chord
ρCTc =12sumi=1
(CT minus CTi)(ci minus c)σCTσc
21
where CT is the mean of the values in the template chord σCT is the standard
deviation of the values in the chord and the operations on c are analogous Note
that the summation is over each individual pitch in the 12 pitch classes The chord
template with the highest value of ρ is selected as the chord for the time frame
After this is performed for each time frame the values are smoothed and then the
change between adjacent chords is observed The reasoning behind this step is that
by measuring the relative distance between chords rather than the chords themselves
all songs can be compared in the same manner even though they may have different
key signatures Finally the study takes the types of chord changes and classifies
them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted
versions of the chord changes that make more sense to a human such as ldquochanges
involving dominant 7th chordsrdquo
In my preliminary implementation of this method on an electronic dance music
corpus I made a few modifications to Mauchrsquos study First I smoothed out time
frames before computing the most probable chords rather than smoothing the most
probable chords I did this to save time and to reduce volatility in the chord
measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference
which contains 935 time frames and lasts 212 seconds 5 time frames is slightly
under 1 second and for preliminary testing appeared to be a good interval for each
time block Second as mentioned in the literature section I did not abstract the
chord changes into H-topics This decision also stemmed from time constraints since
deriving semantic chord meaning from EDM songs would require careful research
into the types of harmonies and sounds common in that genre of music Below I
included a high-level visualization of the pitch metadata found in a sample song
ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change
vector that I could then feed into the Dirichlet Process algorithm
22
Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813
CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513
DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913
FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713
GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213
AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513
13 13 13 13
060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013
13 13 13 13 13 13 13 13 13 13 13 13
13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13
Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13
Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13
Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13
Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13
23
13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13
13
13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13
Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13
24
A final step I took to normalize the chord change data was to divide the numbers by
the length of the song so that each songrsquos number of chord changes was measured per
second
233 Timbre Preprocessing
For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre
uniformly across all songs [8] After collecting all song metadata I took a random
sample of 20 songs from each year starting at 1970 The reason I forced the sampling
to 20 randomly sampled songs from each year and did not take a random sample of
songs from all years at once was to prevent bias towards any type of sounds As seen
in figure 22 there are significantly more songs from 2000-2011 than before 2000 The
mean year is x = 2001052 the median year is 2003 and the standard deviation of the
years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include
a disproportionate amount of more recent songs In order to not miss out on sounds
that may be more prevalent in older songs I required a set number of songs from each
year Next from each randomly selected song I selected 20 random timbre frames
in order to prevent any biases in data collection within each song In total there
were 422020 = 16800 timbre frames collected Next I clustered the timbre frames
using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to
100 and selecting the number of clusters with the lowest Bayes Information Criterion
(BIC) a statistical measure commonly used to calculate the best fitting model The
BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters
and saved the mean values of each of the 12 timbre segments for each cluster formed
In the same way that every song had the same 192 cord changes whose frequencies
could be compared between songs each song now had the same 46 timbre clusters
but different frequencies in each song When reading in the metadata from each song
I calculated the most likely timbre cluster each timbre frame belonged to and kept
25
Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear
a frequency count of all of the possible timbre clusters observed in a song Finally
as with the pitch data I divided all observed counts by the duration of the song in
order to normalize each songrsquos timbre counts
26
Chapter 3
Results
31 Methodology
After the pitch and timbre data was processed I ran the Dirichlet Process on the
data For each song I concatenated the 192-element chord change frequency list
and the 46-element timbre category frequency list giving each song a total of 238
features However there is a problem with this setup The pitch data will inherently
dominate the clustering process since it contains almost 3 times as many features
as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to
give different weights to each feature I considered another possibility to remedy
this discrepancy duplicating the timbre vector a certain number of times and
concatenating that to the feature set of each song While this strategy runs the risk
of corrupting the feature set and turning it into something that does not accurately
represent each song it is important to keep in mind that even without duplicating
the timbre vector the feature set consists of two separate feature sets concatenated
to each other Therefore timbre duplication appears to be a reasonable strategy to
weigh pitch and timbre more evenly
27
After this modification I tweaked a few more parameters before obtaining my
final results Dividing the pitch and timbre frequencies by the duration of the song
normalized every song to frequency per second but it also had the undesired effect
of making the data too small Timbre and pitch frequencies per second were almost
always less than 10 and many times hovered as low as 0002 for nonzero values
Because all of the values were very close to each other using common values of α
in the range of 01 to 1000-2000 was insufficient to push the songs into different
clusters As a result every song fell into the same cluster Increasing the value
of α by several orders of magnitude to well over 10 million fixed the problem but
this solution presented two problems First tuning α to experiment with different
ways to cluster the music would be problematic since I would have to work with
an enormous range of possible values for α Second pushing α to such high values
is not appropriate for the Dirichlet Process Extremely high values of α indicate a
Dirichlet Process that will try to disperse the data into different clusters but a value
of α that high is in principle always assigning each new song to a new cluster On
the other hand varying α between 01 and 1000 for example presents a much wider
range of flexibility when assigning clusters While this may be possible by varying
the values of α an extreme amount with the data as it currently is we are using
the Dirichlet Process in a way it should mathematically not be used Therefore
multiplying all of the data by a constant value so that we can work in the appropriate
range of α is the ideal approach After some experimentation I found that k=10 was
an appropriate scaling factor After initial runs of the Dirichlet Process I found out
that there was a slight issue with some of the earlier songs Since I had only artist
genre tags not specific song tags for each song I chose songs based on whether any
of the tags associated with the artist fell under any electronic music genre including
the generic term rsquoelectronicrsquo There were some bands mostly older ones from the
1960s and 1970s like Electric Light Orchestra which had some electronic music but
28
mostly featured rock funk disco or another genre Given that these artists featured
mostly non-electronic songs I decided to exclude them from my study and generate
a blacklist indicating these music artists While it was infeasible to look through
every single song and determine whether it was electronic or not I was able to look
over the earliest songs in each cluster These songs were the most important to verify
as electronic because early non-electronic songs could end up forming new clusters
and inadvertently create clusters with non-electronic sounds that I was not looking for
The goal of this thesis is to identify different groups in which EM songs are
clustered and identify the most unique artists and genres While the second task is
very simple because it requires looking at the earliest songs in each cluster the first
is difficult to gauge the effectiveness of While I can look at the average chord change
and timbre category frequencies in each category as well as other metadata putting
more semantic interpretations to what the music actually sounds like and determining
whether the music is clustered properly is a very subjective process For this reason
I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and
compared the clustering in each category examining similarities and differences in
the clusters formed in each scenario in the Discussion section For each value α I
set the upper limit of components or clusters allowed to 50 The ranges of α I used
resulted in 9 14 and 19 clusters formed
32 Findings
321 α=005
When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are
the distribution of years of the songs in each cluster (note that the Dirichlet Process
29
does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are
skipped)
30
Figure 31 Song year distributions for α = 005
For each value of α I also calculated the average frequency of each chord change
category and timbre category for each cluster and plotted the results The green
lines correspond to timbre and the blue lines to pitch
31
Figure 32 Timbre and pitch distributions for α = 005
A table of each cluster formed the number of songs in that cluster and descriptions
of pitch timbre and rhythmic qualities characteristic of songs in that cluster are
shown below
32
Cluster Song Count Characteristic Sounds
0 6481 Minimalist industrial space sounds dissonant chords
1 5482 Soft New Age ethereal
2 2405 Defined sounds electronic and non-electronic instru-
ments played in standard rock rhythms
3 360 Very dense and complex synths slightly darker tone
4 4550 Heavily distorted rock and synthesizer
6 2854 Faster paced 80s synth rock acid house
8 798 Aggressive beats dense house music
9 1464 Ambient house trancelike strong beats mysterious
tone
11 1597 Melancholy tones New wave rock in 80s then starting
in 90s downtempo trip-hop nu-metal
Table 31 Song cluster descriptions for α = 005
322 α=01
A total of 14 clusters were formed (16 were formed but 2 clusters contained only
one song each I listened to both of these songs and they did not sound unique so
I discarded them from the clusters) Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
33
34
Figure 33 Song year distributions for α = 01
35
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
most common timbre rhythms and creating more general tags from these combined
terms such as ldquo stepwise changes indicating modal harmonyrdquo for a pitch topic and
ldquooh rounded mellowrdquo for a timbral topic There were two problems with using this
final layer of abstraction for my study First attaching semantic interpretations to
the pitch and timbral lexicons is a difficult task For timbre I would need to listen
to sound samples containing all of the different timbral categories I identified and
attaching user interpretations to them For the chords not only would I have to
perform the same analysis as on timbre but take careful attention to identify which
chords correspond to common sound progressions in popular music a task that I am
not qualified for an did not have the resources for this thesis to seek out Second
this final layer of abstraction was not necessary for the end goal of my paper In
fact consolidating my pitch and timbre lexicons into simpler phrases would run the
risk of pigeonholing my analysis and preventing me from discovering more nuanced
patterns in my final results Therefore I decided to focus on pitch and timbral
lexicon construction as the furthest levels of abstraction when processing songs for
my thesis Mathematical details on how I constructed the lexical and timbral lexicons
can be found in the Mathematical Modeling section of this paper
13 The Dataset
In order to successfully execute my thesis I need access to an extensive database of
music Until recently acquiring a substantial corpus of music data was a difficult and
costly task It is illegal to download music audio files from video and music-sharing
sites such as YouTube Spotify and Pandora Some platforms such as iTunes offer
90-second previews of songs but using only segments of songs and usually segments
that showcase the chorus of the song are not reliable measures to capture the entire
essence of a song Even if I were to legally download entire audio files for free I would
10
run into additional issues Obtaining a high-quality corpora of song data would be
challenging writing scripts that crawl music sharing platforms may not capture all of
the music I am looking for And once I have the audio files I would have to perform
audio processing techniques to extract the relevant information from the songs
Fortunately there is an easy solution to the music data acquisition problem
The Million Song Dataset (MSD) is a collection of metadata for one million music
tracks dating up to 2011 Various organizations such as The Echo Nest Musicbrainz
7digital and Lastfm have contributed different pieces of metadata Each song is
represented as a Hierarchical Data Format file (HDF5) which can be loaded as a
JSON object The fields encompass topical features such as the song title artist
and release date as well as lower-level features such as the loudness starting beat
time pitches and timbre of several segments of the song [9] While the MSD is
the largest free and open source music metadata dataset I could find there is no
guarantee that it adequately covers the entire spectrum of EM artists and songs
This quality limitation is important to consider throughout the study A quick look
through the songs including the subset of data I worked with for this report showed
that there were several well-known artists and songs in the EM scene Therefore
while the MSD may not contain all desired songs for this project it contains an
adequate number of relevant songs to produce some meaningful results Additionally
laying the groundwork for modeling the similarities between songs and identifying
groundbreaking ones is the same regardless of the songs included and the following
methodologies can be implemented on any similarly-formatted dataset including one
with songs that might currently be missing in the MSD
11
Chapter 2
Mathematical Modeling
21 Determining Novelty of Songs
Finding an logical and implementable mathematical model was and continues to be
an important aspect of my research My problem how to mathematically determine
which songs were unique for their time requires an algorithm in which each song is
introduced in chronological order either joining an existing category or starting a
new category based on its musical similarity to songs already introduced Clustering
algorithms like k-means or Gaussian Mixture Models (GMM) which have a prede-
termined number of clusters and optimize the partitioning of a dataset into those
cluster assume a fixed number of clusters While this process would work if we knew
exactly how many genres of EM existed if we guess wrong our end results may end
up with clusters that are wrongly grouped together or separated It is much better to
apply a clustering algorithm that does not make any assumptions about this number
One particularly promising process that addresses the issue of tthe number of
clusters is a family of algorithms known as Dirichlet Proccesses (DPs) DPs are
useful for this particular application because (1) they assign clusters to a dataset
12
with only an upper bound on the number of clusters and (2) by sorting the songs
in chronological order before running the algorithm and keeping track of which
songs are categorized under each cluster we can observe the earliest songs in each
cluster and consequentially infer which songs were responsible for creating new
clusters The arguments for the DP The DP is controlled by a parameter α which
is the concentration parameter The expected number of clusters formed is directly
proportional to the value of α so the higher the value of α the more likely new
clusters will be formed [10] Regardless of the value of α as the number of data
points introduced increases the probability of a new group being formed decreases
That is a ldquorich get richerrdquo policy is in place and existing clusters tend to grow in
size Tweaking the value of the tunable parameter α is an important part of the
study since it determines the flexibility given to forming a new cluster If the value
of α is too small then the criteria for forming clusters will be too strict and data
that should be in different clusters will be assigned to the same cluster On the other
hand if α is too large the algorithm will be too sensitive and assign similar songs to
different clusters
The implementation of the DP was achieved using scikit-learnrsquos library and API for
Dirichlet Process Gaussian Mixture Model (DPGMM) The DPGMM is the formal
name of the Dirichlet Process model used to cluster the data More specifically
scikit-learnrsquos implementation of the DPGMM uses the Stick Breaking method
one of several equally valid methods to assign songs to clusters [11] While the
mathematical details for this algorithm can be found at the following citation [12]
the most important aspects of the DPGMM are the arguments that the user can
specify and tune The first of these tunable parameters is the value α which is the
same parameter as the α discussed in the previous paragraph As seen in Figure 21
on the right side properly tuning α is key to obtaining meaningful clusters The
13
center image has α set to 001 which is too small and results in all of the data being
formed under one cluster On the other hand the bottom-right image has the same
data set and α set to 100 which does a better job of clustering On a related note
the figure also demonstrates the effectiveness of the DPGMM over the GMM On the
left side clearly the dataset contains 2 clusters but the GMM on the top-left image
assumes 5 clusters as a prior and consequentially clusters the data incorrectly while
the DPGMM manages to limit the data to 2 clusters
The second argument that the user inputs for the DPGMM is the data that
will be clustered The scikit-learn implementation takes the data in the format
of a nested list (N lists each of length m) where N is the number of data points
and m the number of features While the format of the data structure is relatively
straightforward choosing which numbers should be in the data was a challenge I
faced Selecting the relevant features of each song to be used in the algorithm will
be expounded upon in the next section ldquoFeature Selectionrdquo
The last argument that a user inputs for the scikit-learn DGPMM implementa-
tion is an argument indicating the upper bound for the number of clusters The
Dirichlet Process then determines the best number of clusters for the data between
1 and the upper bound Since the DPGMM is flexible enough to find the best value
I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to
modify the number of clusters formed
22 Feature Selection
One of the most difficult aspects of the Dirichlet method is choosing the features
to be used for clustering In other words when we organize the songs into clusters
14
Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α
we need to ensure that each cluster is distinct in a way that is statistically and
intuitively logical In the Million Song Dataset [9] each song is represented as a
JSON object containing several fields These fields are candidate features to be used
in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo
by Rick Astley and the corresponding features
artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID
for this artists is db9)
artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)
artist_mbtags_count shape = (4)
(raw tag count of the 4 tags this artist received on musicbrainzorg)
artist_name Rick Astley (artist name)
artist_playmeid 1338 (the ID of that artist on the service playmecom)
artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)
artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest
(number between 0 and 1))
artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest
(number between 0 and 1))
15
audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for
the analysis by The Echo Nest)
bars_confidence shape = (99) (confidence value (between 0 and 1) associated
with each bar by The Echo Nest)
bars_start shape = (99) (start time of each bar according to The Echo Nest this
song has 99 bars)
beats_confidence shape = (397))
confidence value (between 0 and 1) associated with each beat by The Echo Nest
beats_start shape = (397) (start time of each beat according to The Echo Nest
this song has 397 beats)
danceability 00 (danceability measure of this song according to The Echo Nest
(between 0 and 1 0 =gt not analyzed))
duration 21169587 (duration of the track in seconds)
end_of_fade_in 0139 (time of the end of the fade in at the beginning of the
song according to The Echo Nest)
energy 00 (energy measure (not in the signal processing sense) according to The
Echo Nest (between 0 and 1 0 = not analyzed))
key 1 (estimation of the key the song is in by The Echo Nest)
key_confidence 0324 (confidence of the key estimation)
loudness -775 (general loudness of the track)
mode 1 (estimation of the mode the song is in by The Echo Nest)
mode_confidence 0434 (confidence of the mode estimation)
release Big Tunes - Back 2 The 80s (album name from which the track was taken
some songs tracks can come from many albums we give only one)
release_7digitalid 786795 (the ID of the release (album) on the service 7digi-
talcom)
sections_confidence shape = (10) (confidence value (between 0 and 1) associated
16
with each section by The Echo Nest)
sections_start shape = (10) (start time of each section according to The Echo
Nest this song has 10 sections)
segments_confidence shape = (935) (confidence value (between 0 and 1) asso-
ciated with each segment by The Echo Nest)
segments_loudness_max shape = (935) (max loudness during each segment)
segments_loudness_max_time shape = (935) (time of the max loudness
during each segment)
segments_loudness_start shape = (935) (loudness at the beginning of each
segment)
segments_pitches shape = (935 12) (chroma features for each segment (normal-
ized so max is 1))
segments_start shape = (935) (start time of each segment ( musical event or
onset) according to The Echo Nest this song has 935 segments)
segments_timbre shape = (935 12) (MFCC-like features for each segment)
similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar
to Rick Astley according to The Echo Nest)
song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded
(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))
song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can
be associated with many tracks (with very slight audio differences))
start_of _fade _out 198536 (start time of the fade out in seconds at the end
of the song according to The Echo Nest)
tatums_confidence shape = (794) (confidence value (between 0 and 1) associated
with each tatum by The Echo Nest)
tatums_start shape = (794) (start time of each tatum according to The Echo
Nest this song has 794 tatums)
17
tempo 113359 (tempo in BPM according to The Echo Nest)
time_signature 4 (time signature of the song according to The Echo Nest ie
usual number of beats per bar)
time_signature_confidence 0634 (confidence of the time signature estimation)
title Never Gonna Give You Up (song title)
track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)
track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track
on which the analysis was done) year 1987 (year when this song was released
according to musicbrainzorg)
When choosing features my main goal was to use features that would most
likely yield meaningful results yet also be simple and make sense to the average
person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener
will have his or her opinions to what constitutes different types of music but some
common features most people tend to differentiate songs by are pitch rhythm and
the types of instruments used The following specific fields provided in each song
object fall under these three terms
Pitch
bull segments_pitches a matrix of values indicating the strength of each pitch (or
note) at each discernible time interval
Rhythm
bull beats_start a vector of values indicating the start time of each beat
bull time_signature the time signature of the song
bull tempo the speed of the song in Beats Per Minute (BPM)
Instruments
18
bull segments_timbre a matrix of values indicating the distribution of MFCC-like
features (different types of tones) for each segments
The segments_pitches feature is a clear candidate for a differentiating factor for
songs since it reveals patterns of notes that occur Additionally other research
papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a
procedure that allows all songs to be compared with the same metric Likewise
timbre is intuitively a reliable differentiating feature since it reveals the amount
that different tones or sounds that sound different despite having the same pitch
Therefore segments_timbre is another feature that is considered in each song
Finally we look at the candidate features for rhythm At first glance all of these
features appear to be useful as they indicate the rhythm of a song in one way or
another However none of these features are as useful as the pitch and timbre
features While tempo is one factor in differentiating genres of EDM and music in
general tempo alone is not a driving force of musical innovation Certain genres
of EDM like drum nrsquo bass and happycore stand out for having very fast tempos
but the tempo is supplemented with a sound unique to the genre Conceiving new
arrangements of pitches combining instruments in new ways and inventing new
types of sounds are novel but speeding up or slowing down existing sounds is not
Including tempo as a feature could actually add noise to the model since many genres
overlap in their tempos And finally tempo is measured indirectly when the pitch
and timbre features are normalized for each song everything is measured in units of
ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features
each second Time signature can be dismissed from the candidates features for the
same reason as tempo many genres contain the same time signature and including
it in the feature set would only add more noise beats_start looks like a more
promising feature since like segments_pitches and segments_timbre it consists of
a vector of values However difficulties arise when we begin to think how exactly
19
we can utilize this information Since each song varies in length we need a way to
compare songs of different durations on the same level One approach could be to
perform basic statistics on the distance between each beat for example calculating
the mean and standard deviation of this distance However the normalized pitch
and timbre information already capture this data Another possibility is detecting
certain patterns of beats which could differentiate the syncopated dubstep or glitch
music beats from the steady pulse of electro-house But once again every beat is
accompanied by a sound with a specific timbre and pitch so this feature would not
add any significantly new information
23 Collecting Data and Preprocessing Selected Fea-
tures
231 Collecting the Data
Upon deciding the features I wanted to use in my research I first needed to collect
all of the electronic songs in the Million Song Dataset The easiest reliable way to
achieve this was to iterate through each song in the database and save the information
for the songs where any of the artist genre tags in artist_mbtags matched with an
electronic music genre While this measure was not fully accurate because it looks at
the genre of the artist not the song specific genre information for each song was not
as easily accessible so this indicator was nearly as good a substitute To generate a
list of the genres that electronic songs would fall under I manually searched through
a subset of the MSD to find all genres that seemed to be releated to electronic music
In the case of genres that were sometimes but not always electronic in nature such
20
as disco or pop I erred on the side of caution and did not include them in the list
of electronic genres In these cases false positives such as primarily rock songs that
happen to have the disco label attached to the artist could inadvertantly be included
in the dataset The final list of genres is as follows
target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo
rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo
rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo
rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
rsquodance and electronicarsquorsquoelectronicrsquo]
232 Pitch Preprocessing
A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-
cally informed manner The study first takes the raw sound data and converts it into
a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest
amount Then it computes the most likely chord by comparing comparing the 4 most
common types of chords in popular music (major minor dominant 7 and minor 7) to
the observed chord The most common chords are represented as ldquotemplate chordsrdquo
and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For
example using the note C as the first index the C major chord is represented as
CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)
For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is
computed over every template chord
ρCTc =12sumi=1
(CT minus CTi)(ci minus c)σCTσc
21
where CT is the mean of the values in the template chord σCT is the standard
deviation of the values in the chord and the operations on c are analogous Note
that the summation is over each individual pitch in the 12 pitch classes The chord
template with the highest value of ρ is selected as the chord for the time frame
After this is performed for each time frame the values are smoothed and then the
change between adjacent chords is observed The reasoning behind this step is that
by measuring the relative distance between chords rather than the chords themselves
all songs can be compared in the same manner even though they may have different
key signatures Finally the study takes the types of chord changes and classifies
them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted
versions of the chord changes that make more sense to a human such as ldquochanges
involving dominant 7th chordsrdquo
In my preliminary implementation of this method on an electronic dance music
corpus I made a few modifications to Mauchrsquos study First I smoothed out time
frames before computing the most probable chords rather than smoothing the most
probable chords I did this to save time and to reduce volatility in the chord
measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference
which contains 935 time frames and lasts 212 seconds 5 time frames is slightly
under 1 second and for preliminary testing appeared to be a good interval for each
time block Second as mentioned in the literature section I did not abstract the
chord changes into H-topics This decision also stemmed from time constraints since
deriving semantic chord meaning from EDM songs would require careful research
into the types of harmonies and sounds common in that genre of music Below I
included a high-level visualization of the pitch metadata found in a sample song
ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change
vector that I could then feed into the Dirichlet Process algorithm
22
Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813
CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513
DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913
FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713
GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213
AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513
13 13 13 13
060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013
13 13 13 13 13 13 13 13 13 13 13 13
13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13
Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13
Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13
Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13
Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13
23
13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13
13
13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13
Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13
24
A final step I took to normalize the chord change data was to divide the numbers by
the length of the song so that each songrsquos number of chord changes was measured per
second
233 Timbre Preprocessing
For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre
uniformly across all songs [8] After collecting all song metadata I took a random
sample of 20 songs from each year starting at 1970 The reason I forced the sampling
to 20 randomly sampled songs from each year and did not take a random sample of
songs from all years at once was to prevent bias towards any type of sounds As seen
in figure 22 there are significantly more songs from 2000-2011 than before 2000 The
mean year is x = 2001052 the median year is 2003 and the standard deviation of the
years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include
a disproportionate amount of more recent songs In order to not miss out on sounds
that may be more prevalent in older songs I required a set number of songs from each
year Next from each randomly selected song I selected 20 random timbre frames
in order to prevent any biases in data collection within each song In total there
were 422020 = 16800 timbre frames collected Next I clustered the timbre frames
using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to
100 and selecting the number of clusters with the lowest Bayes Information Criterion
(BIC) a statistical measure commonly used to calculate the best fitting model The
BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters
and saved the mean values of each of the 12 timbre segments for each cluster formed
In the same way that every song had the same 192 cord changes whose frequencies
could be compared between songs each song now had the same 46 timbre clusters
but different frequencies in each song When reading in the metadata from each song
I calculated the most likely timbre cluster each timbre frame belonged to and kept
25
Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear
a frequency count of all of the possible timbre clusters observed in a song Finally
as with the pitch data I divided all observed counts by the duration of the song in
order to normalize each songrsquos timbre counts
26
Chapter 3
Results
31 Methodology
After the pitch and timbre data was processed I ran the Dirichlet Process on the
data For each song I concatenated the 192-element chord change frequency list
and the 46-element timbre category frequency list giving each song a total of 238
features However there is a problem with this setup The pitch data will inherently
dominate the clustering process since it contains almost 3 times as many features
as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to
give different weights to each feature I considered another possibility to remedy
this discrepancy duplicating the timbre vector a certain number of times and
concatenating that to the feature set of each song While this strategy runs the risk
of corrupting the feature set and turning it into something that does not accurately
represent each song it is important to keep in mind that even without duplicating
the timbre vector the feature set consists of two separate feature sets concatenated
to each other Therefore timbre duplication appears to be a reasonable strategy to
weigh pitch and timbre more evenly
27
After this modification I tweaked a few more parameters before obtaining my
final results Dividing the pitch and timbre frequencies by the duration of the song
normalized every song to frequency per second but it also had the undesired effect
of making the data too small Timbre and pitch frequencies per second were almost
always less than 10 and many times hovered as low as 0002 for nonzero values
Because all of the values were very close to each other using common values of α
in the range of 01 to 1000-2000 was insufficient to push the songs into different
clusters As a result every song fell into the same cluster Increasing the value
of α by several orders of magnitude to well over 10 million fixed the problem but
this solution presented two problems First tuning α to experiment with different
ways to cluster the music would be problematic since I would have to work with
an enormous range of possible values for α Second pushing α to such high values
is not appropriate for the Dirichlet Process Extremely high values of α indicate a
Dirichlet Process that will try to disperse the data into different clusters but a value
of α that high is in principle always assigning each new song to a new cluster On
the other hand varying α between 01 and 1000 for example presents a much wider
range of flexibility when assigning clusters While this may be possible by varying
the values of α an extreme amount with the data as it currently is we are using
the Dirichlet Process in a way it should mathematically not be used Therefore
multiplying all of the data by a constant value so that we can work in the appropriate
range of α is the ideal approach After some experimentation I found that k=10 was
an appropriate scaling factor After initial runs of the Dirichlet Process I found out
that there was a slight issue with some of the earlier songs Since I had only artist
genre tags not specific song tags for each song I chose songs based on whether any
of the tags associated with the artist fell under any electronic music genre including
the generic term rsquoelectronicrsquo There were some bands mostly older ones from the
1960s and 1970s like Electric Light Orchestra which had some electronic music but
28
mostly featured rock funk disco or another genre Given that these artists featured
mostly non-electronic songs I decided to exclude them from my study and generate
a blacklist indicating these music artists While it was infeasible to look through
every single song and determine whether it was electronic or not I was able to look
over the earliest songs in each cluster These songs were the most important to verify
as electronic because early non-electronic songs could end up forming new clusters
and inadvertently create clusters with non-electronic sounds that I was not looking for
The goal of this thesis is to identify different groups in which EM songs are
clustered and identify the most unique artists and genres While the second task is
very simple because it requires looking at the earliest songs in each cluster the first
is difficult to gauge the effectiveness of While I can look at the average chord change
and timbre category frequencies in each category as well as other metadata putting
more semantic interpretations to what the music actually sounds like and determining
whether the music is clustered properly is a very subjective process For this reason
I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and
compared the clustering in each category examining similarities and differences in
the clusters formed in each scenario in the Discussion section For each value α I
set the upper limit of components or clusters allowed to 50 The ranges of α I used
resulted in 9 14 and 19 clusters formed
32 Findings
321 α=005
When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are
the distribution of years of the songs in each cluster (note that the Dirichlet Process
29
does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are
skipped)
30
Figure 31 Song year distributions for α = 005
For each value of α I also calculated the average frequency of each chord change
category and timbre category for each cluster and plotted the results The green
lines correspond to timbre and the blue lines to pitch
31
Figure 32 Timbre and pitch distributions for α = 005
A table of each cluster formed the number of songs in that cluster and descriptions
of pitch timbre and rhythmic qualities characteristic of songs in that cluster are
shown below
32
Cluster Song Count Characteristic Sounds
0 6481 Minimalist industrial space sounds dissonant chords
1 5482 Soft New Age ethereal
2 2405 Defined sounds electronic and non-electronic instru-
ments played in standard rock rhythms
3 360 Very dense and complex synths slightly darker tone
4 4550 Heavily distorted rock and synthesizer
6 2854 Faster paced 80s synth rock acid house
8 798 Aggressive beats dense house music
9 1464 Ambient house trancelike strong beats mysterious
tone
11 1597 Melancholy tones New wave rock in 80s then starting
in 90s downtempo trip-hop nu-metal
Table 31 Song cluster descriptions for α = 005
322 α=01
A total of 14 clusters were formed (16 were formed but 2 clusters contained only
one song each I listened to both of these songs and they did not sound unique so
I discarded them from the clusters) Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
33
34
Figure 33 Song year distributions for α = 01
35
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
run into additional issues Obtaining a high-quality corpora of song data would be
challenging writing scripts that crawl music sharing platforms may not capture all of
the music I am looking for And once I have the audio files I would have to perform
audio processing techniques to extract the relevant information from the songs
Fortunately there is an easy solution to the music data acquisition problem
The Million Song Dataset (MSD) is a collection of metadata for one million music
tracks dating up to 2011 Various organizations such as The Echo Nest Musicbrainz
7digital and Lastfm have contributed different pieces of metadata Each song is
represented as a Hierarchical Data Format file (HDF5) which can be loaded as a
JSON object The fields encompass topical features such as the song title artist
and release date as well as lower-level features such as the loudness starting beat
time pitches and timbre of several segments of the song [9] While the MSD is
the largest free and open source music metadata dataset I could find there is no
guarantee that it adequately covers the entire spectrum of EM artists and songs
This quality limitation is important to consider throughout the study A quick look
through the songs including the subset of data I worked with for this report showed
that there were several well-known artists and songs in the EM scene Therefore
while the MSD may not contain all desired songs for this project it contains an
adequate number of relevant songs to produce some meaningful results Additionally
laying the groundwork for modeling the similarities between songs and identifying
groundbreaking ones is the same regardless of the songs included and the following
methodologies can be implemented on any similarly-formatted dataset including one
with songs that might currently be missing in the MSD
11
Chapter 2
Mathematical Modeling
21 Determining Novelty of Songs
Finding an logical and implementable mathematical model was and continues to be
an important aspect of my research My problem how to mathematically determine
which songs were unique for their time requires an algorithm in which each song is
introduced in chronological order either joining an existing category or starting a
new category based on its musical similarity to songs already introduced Clustering
algorithms like k-means or Gaussian Mixture Models (GMM) which have a prede-
termined number of clusters and optimize the partitioning of a dataset into those
cluster assume a fixed number of clusters While this process would work if we knew
exactly how many genres of EM existed if we guess wrong our end results may end
up with clusters that are wrongly grouped together or separated It is much better to
apply a clustering algorithm that does not make any assumptions about this number
One particularly promising process that addresses the issue of tthe number of
clusters is a family of algorithms known as Dirichlet Proccesses (DPs) DPs are
useful for this particular application because (1) they assign clusters to a dataset
12
with only an upper bound on the number of clusters and (2) by sorting the songs
in chronological order before running the algorithm and keeping track of which
songs are categorized under each cluster we can observe the earliest songs in each
cluster and consequentially infer which songs were responsible for creating new
clusters The arguments for the DP The DP is controlled by a parameter α which
is the concentration parameter The expected number of clusters formed is directly
proportional to the value of α so the higher the value of α the more likely new
clusters will be formed [10] Regardless of the value of α as the number of data
points introduced increases the probability of a new group being formed decreases
That is a ldquorich get richerrdquo policy is in place and existing clusters tend to grow in
size Tweaking the value of the tunable parameter α is an important part of the
study since it determines the flexibility given to forming a new cluster If the value
of α is too small then the criteria for forming clusters will be too strict and data
that should be in different clusters will be assigned to the same cluster On the other
hand if α is too large the algorithm will be too sensitive and assign similar songs to
different clusters
The implementation of the DP was achieved using scikit-learnrsquos library and API for
Dirichlet Process Gaussian Mixture Model (DPGMM) The DPGMM is the formal
name of the Dirichlet Process model used to cluster the data More specifically
scikit-learnrsquos implementation of the DPGMM uses the Stick Breaking method
one of several equally valid methods to assign songs to clusters [11] While the
mathematical details for this algorithm can be found at the following citation [12]
the most important aspects of the DPGMM are the arguments that the user can
specify and tune The first of these tunable parameters is the value α which is the
same parameter as the α discussed in the previous paragraph As seen in Figure 21
on the right side properly tuning α is key to obtaining meaningful clusters The
13
center image has α set to 001 which is too small and results in all of the data being
formed under one cluster On the other hand the bottom-right image has the same
data set and α set to 100 which does a better job of clustering On a related note
the figure also demonstrates the effectiveness of the DPGMM over the GMM On the
left side clearly the dataset contains 2 clusters but the GMM on the top-left image
assumes 5 clusters as a prior and consequentially clusters the data incorrectly while
the DPGMM manages to limit the data to 2 clusters
The second argument that the user inputs for the DPGMM is the data that
will be clustered The scikit-learn implementation takes the data in the format
of a nested list (N lists each of length m) where N is the number of data points
and m the number of features While the format of the data structure is relatively
straightforward choosing which numbers should be in the data was a challenge I
faced Selecting the relevant features of each song to be used in the algorithm will
be expounded upon in the next section ldquoFeature Selectionrdquo
The last argument that a user inputs for the scikit-learn DGPMM implementa-
tion is an argument indicating the upper bound for the number of clusters The
Dirichlet Process then determines the best number of clusters for the data between
1 and the upper bound Since the DPGMM is flexible enough to find the best value
I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to
modify the number of clusters formed
22 Feature Selection
One of the most difficult aspects of the Dirichlet method is choosing the features
to be used for clustering In other words when we organize the songs into clusters
14
Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α
we need to ensure that each cluster is distinct in a way that is statistically and
intuitively logical In the Million Song Dataset [9] each song is represented as a
JSON object containing several fields These fields are candidate features to be used
in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo
by Rick Astley and the corresponding features
artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID
for this artists is db9)
artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)
artist_mbtags_count shape = (4)
(raw tag count of the 4 tags this artist received on musicbrainzorg)
artist_name Rick Astley (artist name)
artist_playmeid 1338 (the ID of that artist on the service playmecom)
artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)
artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest
(number between 0 and 1))
artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest
(number between 0 and 1))
15
audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for
the analysis by The Echo Nest)
bars_confidence shape = (99) (confidence value (between 0 and 1) associated
with each bar by The Echo Nest)
bars_start shape = (99) (start time of each bar according to The Echo Nest this
song has 99 bars)
beats_confidence shape = (397))
confidence value (between 0 and 1) associated with each beat by The Echo Nest
beats_start shape = (397) (start time of each beat according to The Echo Nest
this song has 397 beats)
danceability 00 (danceability measure of this song according to The Echo Nest
(between 0 and 1 0 =gt not analyzed))
duration 21169587 (duration of the track in seconds)
end_of_fade_in 0139 (time of the end of the fade in at the beginning of the
song according to The Echo Nest)
energy 00 (energy measure (not in the signal processing sense) according to The
Echo Nest (between 0 and 1 0 = not analyzed))
key 1 (estimation of the key the song is in by The Echo Nest)
key_confidence 0324 (confidence of the key estimation)
loudness -775 (general loudness of the track)
mode 1 (estimation of the mode the song is in by The Echo Nest)
mode_confidence 0434 (confidence of the mode estimation)
release Big Tunes - Back 2 The 80s (album name from which the track was taken
some songs tracks can come from many albums we give only one)
release_7digitalid 786795 (the ID of the release (album) on the service 7digi-
talcom)
sections_confidence shape = (10) (confidence value (between 0 and 1) associated
16
with each section by The Echo Nest)
sections_start shape = (10) (start time of each section according to The Echo
Nest this song has 10 sections)
segments_confidence shape = (935) (confidence value (between 0 and 1) asso-
ciated with each segment by The Echo Nest)
segments_loudness_max shape = (935) (max loudness during each segment)
segments_loudness_max_time shape = (935) (time of the max loudness
during each segment)
segments_loudness_start shape = (935) (loudness at the beginning of each
segment)
segments_pitches shape = (935 12) (chroma features for each segment (normal-
ized so max is 1))
segments_start shape = (935) (start time of each segment ( musical event or
onset) according to The Echo Nest this song has 935 segments)
segments_timbre shape = (935 12) (MFCC-like features for each segment)
similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar
to Rick Astley according to The Echo Nest)
song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded
(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))
song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can
be associated with many tracks (with very slight audio differences))
start_of _fade _out 198536 (start time of the fade out in seconds at the end
of the song according to The Echo Nest)
tatums_confidence shape = (794) (confidence value (between 0 and 1) associated
with each tatum by The Echo Nest)
tatums_start shape = (794) (start time of each tatum according to The Echo
Nest this song has 794 tatums)
17
tempo 113359 (tempo in BPM according to The Echo Nest)
time_signature 4 (time signature of the song according to The Echo Nest ie
usual number of beats per bar)
time_signature_confidence 0634 (confidence of the time signature estimation)
title Never Gonna Give You Up (song title)
track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)
track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track
on which the analysis was done) year 1987 (year when this song was released
according to musicbrainzorg)
When choosing features my main goal was to use features that would most
likely yield meaningful results yet also be simple and make sense to the average
person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener
will have his or her opinions to what constitutes different types of music but some
common features most people tend to differentiate songs by are pitch rhythm and
the types of instruments used The following specific fields provided in each song
object fall under these three terms
Pitch
bull segments_pitches a matrix of values indicating the strength of each pitch (or
note) at each discernible time interval
Rhythm
bull beats_start a vector of values indicating the start time of each beat
bull time_signature the time signature of the song
bull tempo the speed of the song in Beats Per Minute (BPM)
Instruments
18
bull segments_timbre a matrix of values indicating the distribution of MFCC-like
features (different types of tones) for each segments
The segments_pitches feature is a clear candidate for a differentiating factor for
songs since it reveals patterns of notes that occur Additionally other research
papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a
procedure that allows all songs to be compared with the same metric Likewise
timbre is intuitively a reliable differentiating feature since it reveals the amount
that different tones or sounds that sound different despite having the same pitch
Therefore segments_timbre is another feature that is considered in each song
Finally we look at the candidate features for rhythm At first glance all of these
features appear to be useful as they indicate the rhythm of a song in one way or
another However none of these features are as useful as the pitch and timbre
features While tempo is one factor in differentiating genres of EDM and music in
general tempo alone is not a driving force of musical innovation Certain genres
of EDM like drum nrsquo bass and happycore stand out for having very fast tempos
but the tempo is supplemented with a sound unique to the genre Conceiving new
arrangements of pitches combining instruments in new ways and inventing new
types of sounds are novel but speeding up or slowing down existing sounds is not
Including tempo as a feature could actually add noise to the model since many genres
overlap in their tempos And finally tempo is measured indirectly when the pitch
and timbre features are normalized for each song everything is measured in units of
ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features
each second Time signature can be dismissed from the candidates features for the
same reason as tempo many genres contain the same time signature and including
it in the feature set would only add more noise beats_start looks like a more
promising feature since like segments_pitches and segments_timbre it consists of
a vector of values However difficulties arise when we begin to think how exactly
19
we can utilize this information Since each song varies in length we need a way to
compare songs of different durations on the same level One approach could be to
perform basic statistics on the distance between each beat for example calculating
the mean and standard deviation of this distance However the normalized pitch
and timbre information already capture this data Another possibility is detecting
certain patterns of beats which could differentiate the syncopated dubstep or glitch
music beats from the steady pulse of electro-house But once again every beat is
accompanied by a sound with a specific timbre and pitch so this feature would not
add any significantly new information
23 Collecting Data and Preprocessing Selected Fea-
tures
231 Collecting the Data
Upon deciding the features I wanted to use in my research I first needed to collect
all of the electronic songs in the Million Song Dataset The easiest reliable way to
achieve this was to iterate through each song in the database and save the information
for the songs where any of the artist genre tags in artist_mbtags matched with an
electronic music genre While this measure was not fully accurate because it looks at
the genre of the artist not the song specific genre information for each song was not
as easily accessible so this indicator was nearly as good a substitute To generate a
list of the genres that electronic songs would fall under I manually searched through
a subset of the MSD to find all genres that seemed to be releated to electronic music
In the case of genres that were sometimes but not always electronic in nature such
20
as disco or pop I erred on the side of caution and did not include them in the list
of electronic genres In these cases false positives such as primarily rock songs that
happen to have the disco label attached to the artist could inadvertantly be included
in the dataset The final list of genres is as follows
target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo
rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo
rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo
rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
rsquodance and electronicarsquorsquoelectronicrsquo]
232 Pitch Preprocessing
A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-
cally informed manner The study first takes the raw sound data and converts it into
a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest
amount Then it computes the most likely chord by comparing comparing the 4 most
common types of chords in popular music (major minor dominant 7 and minor 7) to
the observed chord The most common chords are represented as ldquotemplate chordsrdquo
and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For
example using the note C as the first index the C major chord is represented as
CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)
For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is
computed over every template chord
ρCTc =12sumi=1
(CT minus CTi)(ci minus c)σCTσc
21
where CT is the mean of the values in the template chord σCT is the standard
deviation of the values in the chord and the operations on c are analogous Note
that the summation is over each individual pitch in the 12 pitch classes The chord
template with the highest value of ρ is selected as the chord for the time frame
After this is performed for each time frame the values are smoothed and then the
change between adjacent chords is observed The reasoning behind this step is that
by measuring the relative distance between chords rather than the chords themselves
all songs can be compared in the same manner even though they may have different
key signatures Finally the study takes the types of chord changes and classifies
them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted
versions of the chord changes that make more sense to a human such as ldquochanges
involving dominant 7th chordsrdquo
In my preliminary implementation of this method on an electronic dance music
corpus I made a few modifications to Mauchrsquos study First I smoothed out time
frames before computing the most probable chords rather than smoothing the most
probable chords I did this to save time and to reduce volatility in the chord
measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference
which contains 935 time frames and lasts 212 seconds 5 time frames is slightly
under 1 second and for preliminary testing appeared to be a good interval for each
time block Second as mentioned in the literature section I did not abstract the
chord changes into H-topics This decision also stemmed from time constraints since
deriving semantic chord meaning from EDM songs would require careful research
into the types of harmonies and sounds common in that genre of music Below I
included a high-level visualization of the pitch metadata found in a sample song
ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change
vector that I could then feed into the Dirichlet Process algorithm
22
Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813
CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513
DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913
FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713
GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213
AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513
13 13 13 13
060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013
13 13 13 13 13 13 13 13 13 13 13 13
13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13
Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13
Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13
Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13
Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13
23
13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13
13
13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13
Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13
24
A final step I took to normalize the chord change data was to divide the numbers by
the length of the song so that each songrsquos number of chord changes was measured per
second
233 Timbre Preprocessing
For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre
uniformly across all songs [8] After collecting all song metadata I took a random
sample of 20 songs from each year starting at 1970 The reason I forced the sampling
to 20 randomly sampled songs from each year and did not take a random sample of
songs from all years at once was to prevent bias towards any type of sounds As seen
in figure 22 there are significantly more songs from 2000-2011 than before 2000 The
mean year is x = 2001052 the median year is 2003 and the standard deviation of the
years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include
a disproportionate amount of more recent songs In order to not miss out on sounds
that may be more prevalent in older songs I required a set number of songs from each
year Next from each randomly selected song I selected 20 random timbre frames
in order to prevent any biases in data collection within each song In total there
were 422020 = 16800 timbre frames collected Next I clustered the timbre frames
using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to
100 and selecting the number of clusters with the lowest Bayes Information Criterion
(BIC) a statistical measure commonly used to calculate the best fitting model The
BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters
and saved the mean values of each of the 12 timbre segments for each cluster formed
In the same way that every song had the same 192 cord changes whose frequencies
could be compared between songs each song now had the same 46 timbre clusters
but different frequencies in each song When reading in the metadata from each song
I calculated the most likely timbre cluster each timbre frame belonged to and kept
25
Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear
a frequency count of all of the possible timbre clusters observed in a song Finally
as with the pitch data I divided all observed counts by the duration of the song in
order to normalize each songrsquos timbre counts
26
Chapter 3
Results
31 Methodology
After the pitch and timbre data was processed I ran the Dirichlet Process on the
data For each song I concatenated the 192-element chord change frequency list
and the 46-element timbre category frequency list giving each song a total of 238
features However there is a problem with this setup The pitch data will inherently
dominate the clustering process since it contains almost 3 times as many features
as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to
give different weights to each feature I considered another possibility to remedy
this discrepancy duplicating the timbre vector a certain number of times and
concatenating that to the feature set of each song While this strategy runs the risk
of corrupting the feature set and turning it into something that does not accurately
represent each song it is important to keep in mind that even without duplicating
the timbre vector the feature set consists of two separate feature sets concatenated
to each other Therefore timbre duplication appears to be a reasonable strategy to
weigh pitch and timbre more evenly
27
After this modification I tweaked a few more parameters before obtaining my
final results Dividing the pitch and timbre frequencies by the duration of the song
normalized every song to frequency per second but it also had the undesired effect
of making the data too small Timbre and pitch frequencies per second were almost
always less than 10 and many times hovered as low as 0002 for nonzero values
Because all of the values were very close to each other using common values of α
in the range of 01 to 1000-2000 was insufficient to push the songs into different
clusters As a result every song fell into the same cluster Increasing the value
of α by several orders of magnitude to well over 10 million fixed the problem but
this solution presented two problems First tuning α to experiment with different
ways to cluster the music would be problematic since I would have to work with
an enormous range of possible values for α Second pushing α to such high values
is not appropriate for the Dirichlet Process Extremely high values of α indicate a
Dirichlet Process that will try to disperse the data into different clusters but a value
of α that high is in principle always assigning each new song to a new cluster On
the other hand varying α between 01 and 1000 for example presents a much wider
range of flexibility when assigning clusters While this may be possible by varying
the values of α an extreme amount with the data as it currently is we are using
the Dirichlet Process in a way it should mathematically not be used Therefore
multiplying all of the data by a constant value so that we can work in the appropriate
range of α is the ideal approach After some experimentation I found that k=10 was
an appropriate scaling factor After initial runs of the Dirichlet Process I found out
that there was a slight issue with some of the earlier songs Since I had only artist
genre tags not specific song tags for each song I chose songs based on whether any
of the tags associated with the artist fell under any electronic music genre including
the generic term rsquoelectronicrsquo There were some bands mostly older ones from the
1960s and 1970s like Electric Light Orchestra which had some electronic music but
28
mostly featured rock funk disco or another genre Given that these artists featured
mostly non-electronic songs I decided to exclude them from my study and generate
a blacklist indicating these music artists While it was infeasible to look through
every single song and determine whether it was electronic or not I was able to look
over the earliest songs in each cluster These songs were the most important to verify
as electronic because early non-electronic songs could end up forming new clusters
and inadvertently create clusters with non-electronic sounds that I was not looking for
The goal of this thesis is to identify different groups in which EM songs are
clustered and identify the most unique artists and genres While the second task is
very simple because it requires looking at the earliest songs in each cluster the first
is difficult to gauge the effectiveness of While I can look at the average chord change
and timbre category frequencies in each category as well as other metadata putting
more semantic interpretations to what the music actually sounds like and determining
whether the music is clustered properly is a very subjective process For this reason
I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and
compared the clustering in each category examining similarities and differences in
the clusters formed in each scenario in the Discussion section For each value α I
set the upper limit of components or clusters allowed to 50 The ranges of α I used
resulted in 9 14 and 19 clusters formed
32 Findings
321 α=005
When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are
the distribution of years of the songs in each cluster (note that the Dirichlet Process
29
does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are
skipped)
30
Figure 31 Song year distributions for α = 005
For each value of α I also calculated the average frequency of each chord change
category and timbre category for each cluster and plotted the results The green
lines correspond to timbre and the blue lines to pitch
31
Figure 32 Timbre and pitch distributions for α = 005
A table of each cluster formed the number of songs in that cluster and descriptions
of pitch timbre and rhythmic qualities characteristic of songs in that cluster are
shown below
32
Cluster Song Count Characteristic Sounds
0 6481 Minimalist industrial space sounds dissonant chords
1 5482 Soft New Age ethereal
2 2405 Defined sounds electronic and non-electronic instru-
ments played in standard rock rhythms
3 360 Very dense and complex synths slightly darker tone
4 4550 Heavily distorted rock and synthesizer
6 2854 Faster paced 80s synth rock acid house
8 798 Aggressive beats dense house music
9 1464 Ambient house trancelike strong beats mysterious
tone
11 1597 Melancholy tones New wave rock in 80s then starting
in 90s downtempo trip-hop nu-metal
Table 31 Song cluster descriptions for α = 005
322 α=01
A total of 14 clusters were formed (16 were formed but 2 clusters contained only
one song each I listened to both of these songs and they did not sound unique so
I discarded them from the clusters) Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
33
34
Figure 33 Song year distributions for α = 01
35
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
Chapter 2
Mathematical Modeling
21 Determining Novelty of Songs
Finding an logical and implementable mathematical model was and continues to be
an important aspect of my research My problem how to mathematically determine
which songs were unique for their time requires an algorithm in which each song is
introduced in chronological order either joining an existing category or starting a
new category based on its musical similarity to songs already introduced Clustering
algorithms like k-means or Gaussian Mixture Models (GMM) which have a prede-
termined number of clusters and optimize the partitioning of a dataset into those
cluster assume a fixed number of clusters While this process would work if we knew
exactly how many genres of EM existed if we guess wrong our end results may end
up with clusters that are wrongly grouped together or separated It is much better to
apply a clustering algorithm that does not make any assumptions about this number
One particularly promising process that addresses the issue of tthe number of
clusters is a family of algorithms known as Dirichlet Proccesses (DPs) DPs are
useful for this particular application because (1) they assign clusters to a dataset
12
with only an upper bound on the number of clusters and (2) by sorting the songs
in chronological order before running the algorithm and keeping track of which
songs are categorized under each cluster we can observe the earliest songs in each
cluster and consequentially infer which songs were responsible for creating new
clusters The arguments for the DP The DP is controlled by a parameter α which
is the concentration parameter The expected number of clusters formed is directly
proportional to the value of α so the higher the value of α the more likely new
clusters will be formed [10] Regardless of the value of α as the number of data
points introduced increases the probability of a new group being formed decreases
That is a ldquorich get richerrdquo policy is in place and existing clusters tend to grow in
size Tweaking the value of the tunable parameter α is an important part of the
study since it determines the flexibility given to forming a new cluster If the value
of α is too small then the criteria for forming clusters will be too strict and data
that should be in different clusters will be assigned to the same cluster On the other
hand if α is too large the algorithm will be too sensitive and assign similar songs to
different clusters
The implementation of the DP was achieved using scikit-learnrsquos library and API for
Dirichlet Process Gaussian Mixture Model (DPGMM) The DPGMM is the formal
name of the Dirichlet Process model used to cluster the data More specifically
scikit-learnrsquos implementation of the DPGMM uses the Stick Breaking method
one of several equally valid methods to assign songs to clusters [11] While the
mathematical details for this algorithm can be found at the following citation [12]
the most important aspects of the DPGMM are the arguments that the user can
specify and tune The first of these tunable parameters is the value α which is the
same parameter as the α discussed in the previous paragraph As seen in Figure 21
on the right side properly tuning α is key to obtaining meaningful clusters The
13
center image has α set to 001 which is too small and results in all of the data being
formed under one cluster On the other hand the bottom-right image has the same
data set and α set to 100 which does a better job of clustering On a related note
the figure also demonstrates the effectiveness of the DPGMM over the GMM On the
left side clearly the dataset contains 2 clusters but the GMM on the top-left image
assumes 5 clusters as a prior and consequentially clusters the data incorrectly while
the DPGMM manages to limit the data to 2 clusters
The second argument that the user inputs for the DPGMM is the data that
will be clustered The scikit-learn implementation takes the data in the format
of a nested list (N lists each of length m) where N is the number of data points
and m the number of features While the format of the data structure is relatively
straightforward choosing which numbers should be in the data was a challenge I
faced Selecting the relevant features of each song to be used in the algorithm will
be expounded upon in the next section ldquoFeature Selectionrdquo
The last argument that a user inputs for the scikit-learn DGPMM implementa-
tion is an argument indicating the upper bound for the number of clusters The
Dirichlet Process then determines the best number of clusters for the data between
1 and the upper bound Since the DPGMM is flexible enough to find the best value
I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to
modify the number of clusters formed
22 Feature Selection
One of the most difficult aspects of the Dirichlet method is choosing the features
to be used for clustering In other words when we organize the songs into clusters
14
Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α
we need to ensure that each cluster is distinct in a way that is statistically and
intuitively logical In the Million Song Dataset [9] each song is represented as a
JSON object containing several fields These fields are candidate features to be used
in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo
by Rick Astley and the corresponding features
artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID
for this artists is db9)
artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)
artist_mbtags_count shape = (4)
(raw tag count of the 4 tags this artist received on musicbrainzorg)
artist_name Rick Astley (artist name)
artist_playmeid 1338 (the ID of that artist on the service playmecom)
artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)
artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest
(number between 0 and 1))
artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest
(number between 0 and 1))
15
audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for
the analysis by The Echo Nest)
bars_confidence shape = (99) (confidence value (between 0 and 1) associated
with each bar by The Echo Nest)
bars_start shape = (99) (start time of each bar according to The Echo Nest this
song has 99 bars)
beats_confidence shape = (397))
confidence value (between 0 and 1) associated with each beat by The Echo Nest
beats_start shape = (397) (start time of each beat according to The Echo Nest
this song has 397 beats)
danceability 00 (danceability measure of this song according to The Echo Nest
(between 0 and 1 0 =gt not analyzed))
duration 21169587 (duration of the track in seconds)
end_of_fade_in 0139 (time of the end of the fade in at the beginning of the
song according to The Echo Nest)
energy 00 (energy measure (not in the signal processing sense) according to The
Echo Nest (between 0 and 1 0 = not analyzed))
key 1 (estimation of the key the song is in by The Echo Nest)
key_confidence 0324 (confidence of the key estimation)
loudness -775 (general loudness of the track)
mode 1 (estimation of the mode the song is in by The Echo Nest)
mode_confidence 0434 (confidence of the mode estimation)
release Big Tunes - Back 2 The 80s (album name from which the track was taken
some songs tracks can come from many albums we give only one)
release_7digitalid 786795 (the ID of the release (album) on the service 7digi-
talcom)
sections_confidence shape = (10) (confidence value (between 0 and 1) associated
16
with each section by The Echo Nest)
sections_start shape = (10) (start time of each section according to The Echo
Nest this song has 10 sections)
segments_confidence shape = (935) (confidence value (between 0 and 1) asso-
ciated with each segment by The Echo Nest)
segments_loudness_max shape = (935) (max loudness during each segment)
segments_loudness_max_time shape = (935) (time of the max loudness
during each segment)
segments_loudness_start shape = (935) (loudness at the beginning of each
segment)
segments_pitches shape = (935 12) (chroma features for each segment (normal-
ized so max is 1))
segments_start shape = (935) (start time of each segment ( musical event or
onset) according to The Echo Nest this song has 935 segments)
segments_timbre shape = (935 12) (MFCC-like features for each segment)
similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar
to Rick Astley according to The Echo Nest)
song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded
(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))
song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can
be associated with many tracks (with very slight audio differences))
start_of _fade _out 198536 (start time of the fade out in seconds at the end
of the song according to The Echo Nest)
tatums_confidence shape = (794) (confidence value (between 0 and 1) associated
with each tatum by The Echo Nest)
tatums_start shape = (794) (start time of each tatum according to The Echo
Nest this song has 794 tatums)
17
tempo 113359 (tempo in BPM according to The Echo Nest)
time_signature 4 (time signature of the song according to The Echo Nest ie
usual number of beats per bar)
time_signature_confidence 0634 (confidence of the time signature estimation)
title Never Gonna Give You Up (song title)
track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)
track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track
on which the analysis was done) year 1987 (year when this song was released
according to musicbrainzorg)
When choosing features my main goal was to use features that would most
likely yield meaningful results yet also be simple and make sense to the average
person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener
will have his or her opinions to what constitutes different types of music but some
common features most people tend to differentiate songs by are pitch rhythm and
the types of instruments used The following specific fields provided in each song
object fall under these three terms
Pitch
bull segments_pitches a matrix of values indicating the strength of each pitch (or
note) at each discernible time interval
Rhythm
bull beats_start a vector of values indicating the start time of each beat
bull time_signature the time signature of the song
bull tempo the speed of the song in Beats Per Minute (BPM)
Instruments
18
bull segments_timbre a matrix of values indicating the distribution of MFCC-like
features (different types of tones) for each segments
The segments_pitches feature is a clear candidate for a differentiating factor for
songs since it reveals patterns of notes that occur Additionally other research
papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a
procedure that allows all songs to be compared with the same metric Likewise
timbre is intuitively a reliable differentiating feature since it reveals the amount
that different tones or sounds that sound different despite having the same pitch
Therefore segments_timbre is another feature that is considered in each song
Finally we look at the candidate features for rhythm At first glance all of these
features appear to be useful as they indicate the rhythm of a song in one way or
another However none of these features are as useful as the pitch and timbre
features While tempo is one factor in differentiating genres of EDM and music in
general tempo alone is not a driving force of musical innovation Certain genres
of EDM like drum nrsquo bass and happycore stand out for having very fast tempos
but the tempo is supplemented with a sound unique to the genre Conceiving new
arrangements of pitches combining instruments in new ways and inventing new
types of sounds are novel but speeding up or slowing down existing sounds is not
Including tempo as a feature could actually add noise to the model since many genres
overlap in their tempos And finally tempo is measured indirectly when the pitch
and timbre features are normalized for each song everything is measured in units of
ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features
each second Time signature can be dismissed from the candidates features for the
same reason as tempo many genres contain the same time signature and including
it in the feature set would only add more noise beats_start looks like a more
promising feature since like segments_pitches and segments_timbre it consists of
a vector of values However difficulties arise when we begin to think how exactly
19
we can utilize this information Since each song varies in length we need a way to
compare songs of different durations on the same level One approach could be to
perform basic statistics on the distance between each beat for example calculating
the mean and standard deviation of this distance However the normalized pitch
and timbre information already capture this data Another possibility is detecting
certain patterns of beats which could differentiate the syncopated dubstep or glitch
music beats from the steady pulse of electro-house But once again every beat is
accompanied by a sound with a specific timbre and pitch so this feature would not
add any significantly new information
23 Collecting Data and Preprocessing Selected Fea-
tures
231 Collecting the Data
Upon deciding the features I wanted to use in my research I first needed to collect
all of the electronic songs in the Million Song Dataset The easiest reliable way to
achieve this was to iterate through each song in the database and save the information
for the songs where any of the artist genre tags in artist_mbtags matched with an
electronic music genre While this measure was not fully accurate because it looks at
the genre of the artist not the song specific genre information for each song was not
as easily accessible so this indicator was nearly as good a substitute To generate a
list of the genres that electronic songs would fall under I manually searched through
a subset of the MSD to find all genres that seemed to be releated to electronic music
In the case of genres that were sometimes but not always electronic in nature such
20
as disco or pop I erred on the side of caution and did not include them in the list
of electronic genres In these cases false positives such as primarily rock songs that
happen to have the disco label attached to the artist could inadvertantly be included
in the dataset The final list of genres is as follows
target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo
rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo
rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo
rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
rsquodance and electronicarsquorsquoelectronicrsquo]
232 Pitch Preprocessing
A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-
cally informed manner The study first takes the raw sound data and converts it into
a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest
amount Then it computes the most likely chord by comparing comparing the 4 most
common types of chords in popular music (major minor dominant 7 and minor 7) to
the observed chord The most common chords are represented as ldquotemplate chordsrdquo
and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For
example using the note C as the first index the C major chord is represented as
CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)
For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is
computed over every template chord
ρCTc =12sumi=1
(CT minus CTi)(ci minus c)σCTσc
21
where CT is the mean of the values in the template chord σCT is the standard
deviation of the values in the chord and the operations on c are analogous Note
that the summation is over each individual pitch in the 12 pitch classes The chord
template with the highest value of ρ is selected as the chord for the time frame
After this is performed for each time frame the values are smoothed and then the
change between adjacent chords is observed The reasoning behind this step is that
by measuring the relative distance between chords rather than the chords themselves
all songs can be compared in the same manner even though they may have different
key signatures Finally the study takes the types of chord changes and classifies
them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted
versions of the chord changes that make more sense to a human such as ldquochanges
involving dominant 7th chordsrdquo
In my preliminary implementation of this method on an electronic dance music
corpus I made a few modifications to Mauchrsquos study First I smoothed out time
frames before computing the most probable chords rather than smoothing the most
probable chords I did this to save time and to reduce volatility in the chord
measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference
which contains 935 time frames and lasts 212 seconds 5 time frames is slightly
under 1 second and for preliminary testing appeared to be a good interval for each
time block Second as mentioned in the literature section I did not abstract the
chord changes into H-topics This decision also stemmed from time constraints since
deriving semantic chord meaning from EDM songs would require careful research
into the types of harmonies and sounds common in that genre of music Below I
included a high-level visualization of the pitch metadata found in a sample song
ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change
vector that I could then feed into the Dirichlet Process algorithm
22
Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813
CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513
DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913
FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713
GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213
AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513
13 13 13 13
060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013
13 13 13 13 13 13 13 13 13 13 13 13
13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13
Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13
Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13
Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13
Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13
23
13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13
13
13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13
Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13
24
A final step I took to normalize the chord change data was to divide the numbers by
the length of the song so that each songrsquos number of chord changes was measured per
second
233 Timbre Preprocessing
For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre
uniformly across all songs [8] After collecting all song metadata I took a random
sample of 20 songs from each year starting at 1970 The reason I forced the sampling
to 20 randomly sampled songs from each year and did not take a random sample of
songs from all years at once was to prevent bias towards any type of sounds As seen
in figure 22 there are significantly more songs from 2000-2011 than before 2000 The
mean year is x = 2001052 the median year is 2003 and the standard deviation of the
years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include
a disproportionate amount of more recent songs In order to not miss out on sounds
that may be more prevalent in older songs I required a set number of songs from each
year Next from each randomly selected song I selected 20 random timbre frames
in order to prevent any biases in data collection within each song In total there
were 422020 = 16800 timbre frames collected Next I clustered the timbre frames
using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to
100 and selecting the number of clusters with the lowest Bayes Information Criterion
(BIC) a statistical measure commonly used to calculate the best fitting model The
BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters
and saved the mean values of each of the 12 timbre segments for each cluster formed
In the same way that every song had the same 192 cord changes whose frequencies
could be compared between songs each song now had the same 46 timbre clusters
but different frequencies in each song When reading in the metadata from each song
I calculated the most likely timbre cluster each timbre frame belonged to and kept
25
Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear
a frequency count of all of the possible timbre clusters observed in a song Finally
as with the pitch data I divided all observed counts by the duration of the song in
order to normalize each songrsquos timbre counts
26
Chapter 3
Results
31 Methodology
After the pitch and timbre data was processed I ran the Dirichlet Process on the
data For each song I concatenated the 192-element chord change frequency list
and the 46-element timbre category frequency list giving each song a total of 238
features However there is a problem with this setup The pitch data will inherently
dominate the clustering process since it contains almost 3 times as many features
as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to
give different weights to each feature I considered another possibility to remedy
this discrepancy duplicating the timbre vector a certain number of times and
concatenating that to the feature set of each song While this strategy runs the risk
of corrupting the feature set and turning it into something that does not accurately
represent each song it is important to keep in mind that even without duplicating
the timbre vector the feature set consists of two separate feature sets concatenated
to each other Therefore timbre duplication appears to be a reasonable strategy to
weigh pitch and timbre more evenly
27
After this modification I tweaked a few more parameters before obtaining my
final results Dividing the pitch and timbre frequencies by the duration of the song
normalized every song to frequency per second but it also had the undesired effect
of making the data too small Timbre and pitch frequencies per second were almost
always less than 10 and many times hovered as low as 0002 for nonzero values
Because all of the values were very close to each other using common values of α
in the range of 01 to 1000-2000 was insufficient to push the songs into different
clusters As a result every song fell into the same cluster Increasing the value
of α by several orders of magnitude to well over 10 million fixed the problem but
this solution presented two problems First tuning α to experiment with different
ways to cluster the music would be problematic since I would have to work with
an enormous range of possible values for α Second pushing α to such high values
is not appropriate for the Dirichlet Process Extremely high values of α indicate a
Dirichlet Process that will try to disperse the data into different clusters but a value
of α that high is in principle always assigning each new song to a new cluster On
the other hand varying α between 01 and 1000 for example presents a much wider
range of flexibility when assigning clusters While this may be possible by varying
the values of α an extreme amount with the data as it currently is we are using
the Dirichlet Process in a way it should mathematically not be used Therefore
multiplying all of the data by a constant value so that we can work in the appropriate
range of α is the ideal approach After some experimentation I found that k=10 was
an appropriate scaling factor After initial runs of the Dirichlet Process I found out
that there was a slight issue with some of the earlier songs Since I had only artist
genre tags not specific song tags for each song I chose songs based on whether any
of the tags associated with the artist fell under any electronic music genre including
the generic term rsquoelectronicrsquo There were some bands mostly older ones from the
1960s and 1970s like Electric Light Orchestra which had some electronic music but
28
mostly featured rock funk disco or another genre Given that these artists featured
mostly non-electronic songs I decided to exclude them from my study and generate
a blacklist indicating these music artists While it was infeasible to look through
every single song and determine whether it was electronic or not I was able to look
over the earliest songs in each cluster These songs were the most important to verify
as electronic because early non-electronic songs could end up forming new clusters
and inadvertently create clusters with non-electronic sounds that I was not looking for
The goal of this thesis is to identify different groups in which EM songs are
clustered and identify the most unique artists and genres While the second task is
very simple because it requires looking at the earliest songs in each cluster the first
is difficult to gauge the effectiveness of While I can look at the average chord change
and timbre category frequencies in each category as well as other metadata putting
more semantic interpretations to what the music actually sounds like and determining
whether the music is clustered properly is a very subjective process For this reason
I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and
compared the clustering in each category examining similarities and differences in
the clusters formed in each scenario in the Discussion section For each value α I
set the upper limit of components or clusters allowed to 50 The ranges of α I used
resulted in 9 14 and 19 clusters formed
32 Findings
321 α=005
When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are
the distribution of years of the songs in each cluster (note that the Dirichlet Process
29
does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are
skipped)
30
Figure 31 Song year distributions for α = 005
For each value of α I also calculated the average frequency of each chord change
category and timbre category for each cluster and plotted the results The green
lines correspond to timbre and the blue lines to pitch
31
Figure 32 Timbre and pitch distributions for α = 005
A table of each cluster formed the number of songs in that cluster and descriptions
of pitch timbre and rhythmic qualities characteristic of songs in that cluster are
shown below
32
Cluster Song Count Characteristic Sounds
0 6481 Minimalist industrial space sounds dissonant chords
1 5482 Soft New Age ethereal
2 2405 Defined sounds electronic and non-electronic instru-
ments played in standard rock rhythms
3 360 Very dense and complex synths slightly darker tone
4 4550 Heavily distorted rock and synthesizer
6 2854 Faster paced 80s synth rock acid house
8 798 Aggressive beats dense house music
9 1464 Ambient house trancelike strong beats mysterious
tone
11 1597 Melancholy tones New wave rock in 80s then starting
in 90s downtempo trip-hop nu-metal
Table 31 Song cluster descriptions for α = 005
322 α=01
A total of 14 clusters were formed (16 were formed but 2 clusters contained only
one song each I listened to both of these songs and they did not sound unique so
I discarded them from the clusters) Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
33
34
Figure 33 Song year distributions for α = 01
35
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
with only an upper bound on the number of clusters and (2) by sorting the songs
in chronological order before running the algorithm and keeping track of which
songs are categorized under each cluster we can observe the earliest songs in each
cluster and consequentially infer which songs were responsible for creating new
clusters The arguments for the DP The DP is controlled by a parameter α which
is the concentration parameter The expected number of clusters formed is directly
proportional to the value of α so the higher the value of α the more likely new
clusters will be formed [10] Regardless of the value of α as the number of data
points introduced increases the probability of a new group being formed decreases
That is a ldquorich get richerrdquo policy is in place and existing clusters tend to grow in
size Tweaking the value of the tunable parameter α is an important part of the
study since it determines the flexibility given to forming a new cluster If the value
of α is too small then the criteria for forming clusters will be too strict and data
that should be in different clusters will be assigned to the same cluster On the other
hand if α is too large the algorithm will be too sensitive and assign similar songs to
different clusters
The implementation of the DP was achieved using scikit-learnrsquos library and API for
Dirichlet Process Gaussian Mixture Model (DPGMM) The DPGMM is the formal
name of the Dirichlet Process model used to cluster the data More specifically
scikit-learnrsquos implementation of the DPGMM uses the Stick Breaking method
one of several equally valid methods to assign songs to clusters [11] While the
mathematical details for this algorithm can be found at the following citation [12]
the most important aspects of the DPGMM are the arguments that the user can
specify and tune The first of these tunable parameters is the value α which is the
same parameter as the α discussed in the previous paragraph As seen in Figure 21
on the right side properly tuning α is key to obtaining meaningful clusters The
13
center image has α set to 001 which is too small and results in all of the data being
formed under one cluster On the other hand the bottom-right image has the same
data set and α set to 100 which does a better job of clustering On a related note
the figure also demonstrates the effectiveness of the DPGMM over the GMM On the
left side clearly the dataset contains 2 clusters but the GMM on the top-left image
assumes 5 clusters as a prior and consequentially clusters the data incorrectly while
the DPGMM manages to limit the data to 2 clusters
The second argument that the user inputs for the DPGMM is the data that
will be clustered The scikit-learn implementation takes the data in the format
of a nested list (N lists each of length m) where N is the number of data points
and m the number of features While the format of the data structure is relatively
straightforward choosing which numbers should be in the data was a challenge I
faced Selecting the relevant features of each song to be used in the algorithm will
be expounded upon in the next section ldquoFeature Selectionrdquo
The last argument that a user inputs for the scikit-learn DGPMM implementa-
tion is an argument indicating the upper bound for the number of clusters The
Dirichlet Process then determines the best number of clusters for the data between
1 and the upper bound Since the DPGMM is flexible enough to find the best value
I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to
modify the number of clusters formed
22 Feature Selection
One of the most difficult aspects of the Dirichlet method is choosing the features
to be used for clustering In other words when we organize the songs into clusters
14
Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α
we need to ensure that each cluster is distinct in a way that is statistically and
intuitively logical In the Million Song Dataset [9] each song is represented as a
JSON object containing several fields These fields are candidate features to be used
in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo
by Rick Astley and the corresponding features
artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID
for this artists is db9)
artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)
artist_mbtags_count shape = (4)
(raw tag count of the 4 tags this artist received on musicbrainzorg)
artist_name Rick Astley (artist name)
artist_playmeid 1338 (the ID of that artist on the service playmecom)
artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)
artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest
(number between 0 and 1))
artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest
(number between 0 and 1))
15
audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for
the analysis by The Echo Nest)
bars_confidence shape = (99) (confidence value (between 0 and 1) associated
with each bar by The Echo Nest)
bars_start shape = (99) (start time of each bar according to The Echo Nest this
song has 99 bars)
beats_confidence shape = (397))
confidence value (between 0 and 1) associated with each beat by The Echo Nest
beats_start shape = (397) (start time of each beat according to The Echo Nest
this song has 397 beats)
danceability 00 (danceability measure of this song according to The Echo Nest
(between 0 and 1 0 =gt not analyzed))
duration 21169587 (duration of the track in seconds)
end_of_fade_in 0139 (time of the end of the fade in at the beginning of the
song according to The Echo Nest)
energy 00 (energy measure (not in the signal processing sense) according to The
Echo Nest (between 0 and 1 0 = not analyzed))
key 1 (estimation of the key the song is in by The Echo Nest)
key_confidence 0324 (confidence of the key estimation)
loudness -775 (general loudness of the track)
mode 1 (estimation of the mode the song is in by The Echo Nest)
mode_confidence 0434 (confidence of the mode estimation)
release Big Tunes - Back 2 The 80s (album name from which the track was taken
some songs tracks can come from many albums we give only one)
release_7digitalid 786795 (the ID of the release (album) on the service 7digi-
talcom)
sections_confidence shape = (10) (confidence value (between 0 and 1) associated
16
with each section by The Echo Nest)
sections_start shape = (10) (start time of each section according to The Echo
Nest this song has 10 sections)
segments_confidence shape = (935) (confidence value (between 0 and 1) asso-
ciated with each segment by The Echo Nest)
segments_loudness_max shape = (935) (max loudness during each segment)
segments_loudness_max_time shape = (935) (time of the max loudness
during each segment)
segments_loudness_start shape = (935) (loudness at the beginning of each
segment)
segments_pitches shape = (935 12) (chroma features for each segment (normal-
ized so max is 1))
segments_start shape = (935) (start time of each segment ( musical event or
onset) according to The Echo Nest this song has 935 segments)
segments_timbre shape = (935 12) (MFCC-like features for each segment)
similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar
to Rick Astley according to The Echo Nest)
song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded
(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))
song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can
be associated with many tracks (with very slight audio differences))
start_of _fade _out 198536 (start time of the fade out in seconds at the end
of the song according to The Echo Nest)
tatums_confidence shape = (794) (confidence value (between 0 and 1) associated
with each tatum by The Echo Nest)
tatums_start shape = (794) (start time of each tatum according to The Echo
Nest this song has 794 tatums)
17
tempo 113359 (tempo in BPM according to The Echo Nest)
time_signature 4 (time signature of the song according to The Echo Nest ie
usual number of beats per bar)
time_signature_confidence 0634 (confidence of the time signature estimation)
title Never Gonna Give You Up (song title)
track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)
track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track
on which the analysis was done) year 1987 (year when this song was released
according to musicbrainzorg)
When choosing features my main goal was to use features that would most
likely yield meaningful results yet also be simple and make sense to the average
person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener
will have his or her opinions to what constitutes different types of music but some
common features most people tend to differentiate songs by are pitch rhythm and
the types of instruments used The following specific fields provided in each song
object fall under these three terms
Pitch
bull segments_pitches a matrix of values indicating the strength of each pitch (or
note) at each discernible time interval
Rhythm
bull beats_start a vector of values indicating the start time of each beat
bull time_signature the time signature of the song
bull tempo the speed of the song in Beats Per Minute (BPM)
Instruments
18
bull segments_timbre a matrix of values indicating the distribution of MFCC-like
features (different types of tones) for each segments
The segments_pitches feature is a clear candidate for a differentiating factor for
songs since it reveals patterns of notes that occur Additionally other research
papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a
procedure that allows all songs to be compared with the same metric Likewise
timbre is intuitively a reliable differentiating feature since it reveals the amount
that different tones or sounds that sound different despite having the same pitch
Therefore segments_timbre is another feature that is considered in each song
Finally we look at the candidate features for rhythm At first glance all of these
features appear to be useful as they indicate the rhythm of a song in one way or
another However none of these features are as useful as the pitch and timbre
features While tempo is one factor in differentiating genres of EDM and music in
general tempo alone is not a driving force of musical innovation Certain genres
of EDM like drum nrsquo bass and happycore stand out for having very fast tempos
but the tempo is supplemented with a sound unique to the genre Conceiving new
arrangements of pitches combining instruments in new ways and inventing new
types of sounds are novel but speeding up or slowing down existing sounds is not
Including tempo as a feature could actually add noise to the model since many genres
overlap in their tempos And finally tempo is measured indirectly when the pitch
and timbre features are normalized for each song everything is measured in units of
ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features
each second Time signature can be dismissed from the candidates features for the
same reason as tempo many genres contain the same time signature and including
it in the feature set would only add more noise beats_start looks like a more
promising feature since like segments_pitches and segments_timbre it consists of
a vector of values However difficulties arise when we begin to think how exactly
19
we can utilize this information Since each song varies in length we need a way to
compare songs of different durations on the same level One approach could be to
perform basic statistics on the distance between each beat for example calculating
the mean and standard deviation of this distance However the normalized pitch
and timbre information already capture this data Another possibility is detecting
certain patterns of beats which could differentiate the syncopated dubstep or glitch
music beats from the steady pulse of electro-house But once again every beat is
accompanied by a sound with a specific timbre and pitch so this feature would not
add any significantly new information
23 Collecting Data and Preprocessing Selected Fea-
tures
231 Collecting the Data
Upon deciding the features I wanted to use in my research I first needed to collect
all of the electronic songs in the Million Song Dataset The easiest reliable way to
achieve this was to iterate through each song in the database and save the information
for the songs where any of the artist genre tags in artist_mbtags matched with an
electronic music genre While this measure was not fully accurate because it looks at
the genre of the artist not the song specific genre information for each song was not
as easily accessible so this indicator was nearly as good a substitute To generate a
list of the genres that electronic songs would fall under I manually searched through
a subset of the MSD to find all genres that seemed to be releated to electronic music
In the case of genres that were sometimes but not always electronic in nature such
20
as disco or pop I erred on the side of caution and did not include them in the list
of electronic genres In these cases false positives such as primarily rock songs that
happen to have the disco label attached to the artist could inadvertantly be included
in the dataset The final list of genres is as follows
target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo
rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo
rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo
rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
rsquodance and electronicarsquorsquoelectronicrsquo]
232 Pitch Preprocessing
A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-
cally informed manner The study first takes the raw sound data and converts it into
a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest
amount Then it computes the most likely chord by comparing comparing the 4 most
common types of chords in popular music (major minor dominant 7 and minor 7) to
the observed chord The most common chords are represented as ldquotemplate chordsrdquo
and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For
example using the note C as the first index the C major chord is represented as
CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)
For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is
computed over every template chord
ρCTc =12sumi=1
(CT minus CTi)(ci minus c)σCTσc
21
where CT is the mean of the values in the template chord σCT is the standard
deviation of the values in the chord and the operations on c are analogous Note
that the summation is over each individual pitch in the 12 pitch classes The chord
template with the highest value of ρ is selected as the chord for the time frame
After this is performed for each time frame the values are smoothed and then the
change between adjacent chords is observed The reasoning behind this step is that
by measuring the relative distance between chords rather than the chords themselves
all songs can be compared in the same manner even though they may have different
key signatures Finally the study takes the types of chord changes and classifies
them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted
versions of the chord changes that make more sense to a human such as ldquochanges
involving dominant 7th chordsrdquo
In my preliminary implementation of this method on an electronic dance music
corpus I made a few modifications to Mauchrsquos study First I smoothed out time
frames before computing the most probable chords rather than smoothing the most
probable chords I did this to save time and to reduce volatility in the chord
measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference
which contains 935 time frames and lasts 212 seconds 5 time frames is slightly
under 1 second and for preliminary testing appeared to be a good interval for each
time block Second as mentioned in the literature section I did not abstract the
chord changes into H-topics This decision also stemmed from time constraints since
deriving semantic chord meaning from EDM songs would require careful research
into the types of harmonies and sounds common in that genre of music Below I
included a high-level visualization of the pitch metadata found in a sample song
ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change
vector that I could then feed into the Dirichlet Process algorithm
22
Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813
CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513
DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913
FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713
GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213
AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513
13 13 13 13
060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013
13 13 13 13 13 13 13 13 13 13 13 13
13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13
Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13
Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13
Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13
Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13
23
13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13
13
13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13
Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13
24
A final step I took to normalize the chord change data was to divide the numbers by
the length of the song so that each songrsquos number of chord changes was measured per
second
233 Timbre Preprocessing
For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre
uniformly across all songs [8] After collecting all song metadata I took a random
sample of 20 songs from each year starting at 1970 The reason I forced the sampling
to 20 randomly sampled songs from each year and did not take a random sample of
songs from all years at once was to prevent bias towards any type of sounds As seen
in figure 22 there are significantly more songs from 2000-2011 than before 2000 The
mean year is x = 2001052 the median year is 2003 and the standard deviation of the
years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include
a disproportionate amount of more recent songs In order to not miss out on sounds
that may be more prevalent in older songs I required a set number of songs from each
year Next from each randomly selected song I selected 20 random timbre frames
in order to prevent any biases in data collection within each song In total there
were 422020 = 16800 timbre frames collected Next I clustered the timbre frames
using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to
100 and selecting the number of clusters with the lowest Bayes Information Criterion
(BIC) a statistical measure commonly used to calculate the best fitting model The
BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters
and saved the mean values of each of the 12 timbre segments for each cluster formed
In the same way that every song had the same 192 cord changes whose frequencies
could be compared between songs each song now had the same 46 timbre clusters
but different frequencies in each song When reading in the metadata from each song
I calculated the most likely timbre cluster each timbre frame belonged to and kept
25
Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear
a frequency count of all of the possible timbre clusters observed in a song Finally
as with the pitch data I divided all observed counts by the duration of the song in
order to normalize each songrsquos timbre counts
26
Chapter 3
Results
31 Methodology
After the pitch and timbre data was processed I ran the Dirichlet Process on the
data For each song I concatenated the 192-element chord change frequency list
and the 46-element timbre category frequency list giving each song a total of 238
features However there is a problem with this setup The pitch data will inherently
dominate the clustering process since it contains almost 3 times as many features
as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to
give different weights to each feature I considered another possibility to remedy
this discrepancy duplicating the timbre vector a certain number of times and
concatenating that to the feature set of each song While this strategy runs the risk
of corrupting the feature set and turning it into something that does not accurately
represent each song it is important to keep in mind that even without duplicating
the timbre vector the feature set consists of two separate feature sets concatenated
to each other Therefore timbre duplication appears to be a reasonable strategy to
weigh pitch and timbre more evenly
27
After this modification I tweaked a few more parameters before obtaining my
final results Dividing the pitch and timbre frequencies by the duration of the song
normalized every song to frequency per second but it also had the undesired effect
of making the data too small Timbre and pitch frequencies per second were almost
always less than 10 and many times hovered as low as 0002 for nonzero values
Because all of the values were very close to each other using common values of α
in the range of 01 to 1000-2000 was insufficient to push the songs into different
clusters As a result every song fell into the same cluster Increasing the value
of α by several orders of magnitude to well over 10 million fixed the problem but
this solution presented two problems First tuning α to experiment with different
ways to cluster the music would be problematic since I would have to work with
an enormous range of possible values for α Second pushing α to such high values
is not appropriate for the Dirichlet Process Extremely high values of α indicate a
Dirichlet Process that will try to disperse the data into different clusters but a value
of α that high is in principle always assigning each new song to a new cluster On
the other hand varying α between 01 and 1000 for example presents a much wider
range of flexibility when assigning clusters While this may be possible by varying
the values of α an extreme amount with the data as it currently is we are using
the Dirichlet Process in a way it should mathematically not be used Therefore
multiplying all of the data by a constant value so that we can work in the appropriate
range of α is the ideal approach After some experimentation I found that k=10 was
an appropriate scaling factor After initial runs of the Dirichlet Process I found out
that there was a slight issue with some of the earlier songs Since I had only artist
genre tags not specific song tags for each song I chose songs based on whether any
of the tags associated with the artist fell under any electronic music genre including
the generic term rsquoelectronicrsquo There were some bands mostly older ones from the
1960s and 1970s like Electric Light Orchestra which had some electronic music but
28
mostly featured rock funk disco or another genre Given that these artists featured
mostly non-electronic songs I decided to exclude them from my study and generate
a blacklist indicating these music artists While it was infeasible to look through
every single song and determine whether it was electronic or not I was able to look
over the earliest songs in each cluster These songs were the most important to verify
as electronic because early non-electronic songs could end up forming new clusters
and inadvertently create clusters with non-electronic sounds that I was not looking for
The goal of this thesis is to identify different groups in which EM songs are
clustered and identify the most unique artists and genres While the second task is
very simple because it requires looking at the earliest songs in each cluster the first
is difficult to gauge the effectiveness of While I can look at the average chord change
and timbre category frequencies in each category as well as other metadata putting
more semantic interpretations to what the music actually sounds like and determining
whether the music is clustered properly is a very subjective process For this reason
I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and
compared the clustering in each category examining similarities and differences in
the clusters formed in each scenario in the Discussion section For each value α I
set the upper limit of components or clusters allowed to 50 The ranges of α I used
resulted in 9 14 and 19 clusters formed
32 Findings
321 α=005
When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are
the distribution of years of the songs in each cluster (note that the Dirichlet Process
29
does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are
skipped)
30
Figure 31 Song year distributions for α = 005
For each value of α I also calculated the average frequency of each chord change
category and timbre category for each cluster and plotted the results The green
lines correspond to timbre and the blue lines to pitch
31
Figure 32 Timbre and pitch distributions for α = 005
A table of each cluster formed the number of songs in that cluster and descriptions
of pitch timbre and rhythmic qualities characteristic of songs in that cluster are
shown below
32
Cluster Song Count Characteristic Sounds
0 6481 Minimalist industrial space sounds dissonant chords
1 5482 Soft New Age ethereal
2 2405 Defined sounds electronic and non-electronic instru-
ments played in standard rock rhythms
3 360 Very dense and complex synths slightly darker tone
4 4550 Heavily distorted rock and synthesizer
6 2854 Faster paced 80s synth rock acid house
8 798 Aggressive beats dense house music
9 1464 Ambient house trancelike strong beats mysterious
tone
11 1597 Melancholy tones New wave rock in 80s then starting
in 90s downtempo trip-hop nu-metal
Table 31 Song cluster descriptions for α = 005
322 α=01
A total of 14 clusters were formed (16 were formed but 2 clusters contained only
one song each I listened to both of these songs and they did not sound unique so
I discarded them from the clusters) Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
33
34
Figure 33 Song year distributions for α = 01
35
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
center image has α set to 001 which is too small and results in all of the data being
formed under one cluster On the other hand the bottom-right image has the same
data set and α set to 100 which does a better job of clustering On a related note
the figure also demonstrates the effectiveness of the DPGMM over the GMM On the
left side clearly the dataset contains 2 clusters but the GMM on the top-left image
assumes 5 clusters as a prior and consequentially clusters the data incorrectly while
the DPGMM manages to limit the data to 2 clusters
The second argument that the user inputs for the DPGMM is the data that
will be clustered The scikit-learn implementation takes the data in the format
of a nested list (N lists each of length m) where N is the number of data points
and m the number of features While the format of the data structure is relatively
straightforward choosing which numbers should be in the data was a challenge I
faced Selecting the relevant features of each song to be used in the algorithm will
be expounded upon in the next section ldquoFeature Selectionrdquo
The last argument that a user inputs for the scikit-learn DGPMM implementa-
tion is an argument indicating the upper bound for the number of clusters The
Dirichlet Process then determines the best number of clusters for the data between
1 and the upper bound Since the DPGMM is flexible enough to find the best value
I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to
modify the number of clusters formed
22 Feature Selection
One of the most difficult aspects of the Dirichlet method is choosing the features
to be used for clustering In other words when we organize the songs into clusters
14
Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α
we need to ensure that each cluster is distinct in a way that is statistically and
intuitively logical In the Million Song Dataset [9] each song is represented as a
JSON object containing several fields These fields are candidate features to be used
in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo
by Rick Astley and the corresponding features
artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID
for this artists is db9)
artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)
artist_mbtags_count shape = (4)
(raw tag count of the 4 tags this artist received on musicbrainzorg)
artist_name Rick Astley (artist name)
artist_playmeid 1338 (the ID of that artist on the service playmecom)
artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)
artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest
(number between 0 and 1))
artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest
(number between 0 and 1))
15
audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for
the analysis by The Echo Nest)
bars_confidence shape = (99) (confidence value (between 0 and 1) associated
with each bar by The Echo Nest)
bars_start shape = (99) (start time of each bar according to The Echo Nest this
song has 99 bars)
beats_confidence shape = (397))
confidence value (between 0 and 1) associated with each beat by The Echo Nest
beats_start shape = (397) (start time of each beat according to The Echo Nest
this song has 397 beats)
danceability 00 (danceability measure of this song according to The Echo Nest
(between 0 and 1 0 =gt not analyzed))
duration 21169587 (duration of the track in seconds)
end_of_fade_in 0139 (time of the end of the fade in at the beginning of the
song according to The Echo Nest)
energy 00 (energy measure (not in the signal processing sense) according to The
Echo Nest (between 0 and 1 0 = not analyzed))
key 1 (estimation of the key the song is in by The Echo Nest)
key_confidence 0324 (confidence of the key estimation)
loudness -775 (general loudness of the track)
mode 1 (estimation of the mode the song is in by The Echo Nest)
mode_confidence 0434 (confidence of the mode estimation)
release Big Tunes - Back 2 The 80s (album name from which the track was taken
some songs tracks can come from many albums we give only one)
release_7digitalid 786795 (the ID of the release (album) on the service 7digi-
talcom)
sections_confidence shape = (10) (confidence value (between 0 and 1) associated
16
with each section by The Echo Nest)
sections_start shape = (10) (start time of each section according to The Echo
Nest this song has 10 sections)
segments_confidence shape = (935) (confidence value (between 0 and 1) asso-
ciated with each segment by The Echo Nest)
segments_loudness_max shape = (935) (max loudness during each segment)
segments_loudness_max_time shape = (935) (time of the max loudness
during each segment)
segments_loudness_start shape = (935) (loudness at the beginning of each
segment)
segments_pitches shape = (935 12) (chroma features for each segment (normal-
ized so max is 1))
segments_start shape = (935) (start time of each segment ( musical event or
onset) according to The Echo Nest this song has 935 segments)
segments_timbre shape = (935 12) (MFCC-like features for each segment)
similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar
to Rick Astley according to The Echo Nest)
song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded
(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))
song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can
be associated with many tracks (with very slight audio differences))
start_of _fade _out 198536 (start time of the fade out in seconds at the end
of the song according to The Echo Nest)
tatums_confidence shape = (794) (confidence value (between 0 and 1) associated
with each tatum by The Echo Nest)
tatums_start shape = (794) (start time of each tatum according to The Echo
Nest this song has 794 tatums)
17
tempo 113359 (tempo in BPM according to The Echo Nest)
time_signature 4 (time signature of the song according to The Echo Nest ie
usual number of beats per bar)
time_signature_confidence 0634 (confidence of the time signature estimation)
title Never Gonna Give You Up (song title)
track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)
track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track
on which the analysis was done) year 1987 (year when this song was released
according to musicbrainzorg)
When choosing features my main goal was to use features that would most
likely yield meaningful results yet also be simple and make sense to the average
person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener
will have his or her opinions to what constitutes different types of music but some
common features most people tend to differentiate songs by are pitch rhythm and
the types of instruments used The following specific fields provided in each song
object fall under these three terms
Pitch
bull segments_pitches a matrix of values indicating the strength of each pitch (or
note) at each discernible time interval
Rhythm
bull beats_start a vector of values indicating the start time of each beat
bull time_signature the time signature of the song
bull tempo the speed of the song in Beats Per Minute (BPM)
Instruments
18
bull segments_timbre a matrix of values indicating the distribution of MFCC-like
features (different types of tones) for each segments
The segments_pitches feature is a clear candidate for a differentiating factor for
songs since it reveals patterns of notes that occur Additionally other research
papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a
procedure that allows all songs to be compared with the same metric Likewise
timbre is intuitively a reliable differentiating feature since it reveals the amount
that different tones or sounds that sound different despite having the same pitch
Therefore segments_timbre is another feature that is considered in each song
Finally we look at the candidate features for rhythm At first glance all of these
features appear to be useful as they indicate the rhythm of a song in one way or
another However none of these features are as useful as the pitch and timbre
features While tempo is one factor in differentiating genres of EDM and music in
general tempo alone is not a driving force of musical innovation Certain genres
of EDM like drum nrsquo bass and happycore stand out for having very fast tempos
but the tempo is supplemented with a sound unique to the genre Conceiving new
arrangements of pitches combining instruments in new ways and inventing new
types of sounds are novel but speeding up or slowing down existing sounds is not
Including tempo as a feature could actually add noise to the model since many genres
overlap in their tempos And finally tempo is measured indirectly when the pitch
and timbre features are normalized for each song everything is measured in units of
ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features
each second Time signature can be dismissed from the candidates features for the
same reason as tempo many genres contain the same time signature and including
it in the feature set would only add more noise beats_start looks like a more
promising feature since like segments_pitches and segments_timbre it consists of
a vector of values However difficulties arise when we begin to think how exactly
19
we can utilize this information Since each song varies in length we need a way to
compare songs of different durations on the same level One approach could be to
perform basic statistics on the distance between each beat for example calculating
the mean and standard deviation of this distance However the normalized pitch
and timbre information already capture this data Another possibility is detecting
certain patterns of beats which could differentiate the syncopated dubstep or glitch
music beats from the steady pulse of electro-house But once again every beat is
accompanied by a sound with a specific timbre and pitch so this feature would not
add any significantly new information
23 Collecting Data and Preprocessing Selected Fea-
tures
231 Collecting the Data
Upon deciding the features I wanted to use in my research I first needed to collect
all of the electronic songs in the Million Song Dataset The easiest reliable way to
achieve this was to iterate through each song in the database and save the information
for the songs where any of the artist genre tags in artist_mbtags matched with an
electronic music genre While this measure was not fully accurate because it looks at
the genre of the artist not the song specific genre information for each song was not
as easily accessible so this indicator was nearly as good a substitute To generate a
list of the genres that electronic songs would fall under I manually searched through
a subset of the MSD to find all genres that seemed to be releated to electronic music
In the case of genres that were sometimes but not always electronic in nature such
20
as disco or pop I erred on the side of caution and did not include them in the list
of electronic genres In these cases false positives such as primarily rock songs that
happen to have the disco label attached to the artist could inadvertantly be included
in the dataset The final list of genres is as follows
target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo
rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo
rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo
rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
rsquodance and electronicarsquorsquoelectronicrsquo]
232 Pitch Preprocessing
A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-
cally informed manner The study first takes the raw sound data and converts it into
a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest
amount Then it computes the most likely chord by comparing comparing the 4 most
common types of chords in popular music (major minor dominant 7 and minor 7) to
the observed chord The most common chords are represented as ldquotemplate chordsrdquo
and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For
example using the note C as the first index the C major chord is represented as
CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)
For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is
computed over every template chord
ρCTc =12sumi=1
(CT minus CTi)(ci minus c)σCTσc
21
where CT is the mean of the values in the template chord σCT is the standard
deviation of the values in the chord and the operations on c are analogous Note
that the summation is over each individual pitch in the 12 pitch classes The chord
template with the highest value of ρ is selected as the chord for the time frame
After this is performed for each time frame the values are smoothed and then the
change between adjacent chords is observed The reasoning behind this step is that
by measuring the relative distance between chords rather than the chords themselves
all songs can be compared in the same manner even though they may have different
key signatures Finally the study takes the types of chord changes and classifies
them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted
versions of the chord changes that make more sense to a human such as ldquochanges
involving dominant 7th chordsrdquo
In my preliminary implementation of this method on an electronic dance music
corpus I made a few modifications to Mauchrsquos study First I smoothed out time
frames before computing the most probable chords rather than smoothing the most
probable chords I did this to save time and to reduce volatility in the chord
measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference
which contains 935 time frames and lasts 212 seconds 5 time frames is slightly
under 1 second and for preliminary testing appeared to be a good interval for each
time block Second as mentioned in the literature section I did not abstract the
chord changes into H-topics This decision also stemmed from time constraints since
deriving semantic chord meaning from EDM songs would require careful research
into the types of harmonies and sounds common in that genre of music Below I
included a high-level visualization of the pitch metadata found in a sample song
ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change
vector that I could then feed into the Dirichlet Process algorithm
22
Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813
CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513
DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913
FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713
GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213
AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513
13 13 13 13
060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013
13 13 13 13 13 13 13 13 13 13 13 13
13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13
Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13
Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13
Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13
Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13
23
13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13
13
13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13
Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13
24
A final step I took to normalize the chord change data was to divide the numbers by
the length of the song so that each songrsquos number of chord changes was measured per
second
233 Timbre Preprocessing
For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre
uniformly across all songs [8] After collecting all song metadata I took a random
sample of 20 songs from each year starting at 1970 The reason I forced the sampling
to 20 randomly sampled songs from each year and did not take a random sample of
songs from all years at once was to prevent bias towards any type of sounds As seen
in figure 22 there are significantly more songs from 2000-2011 than before 2000 The
mean year is x = 2001052 the median year is 2003 and the standard deviation of the
years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include
a disproportionate amount of more recent songs In order to not miss out on sounds
that may be more prevalent in older songs I required a set number of songs from each
year Next from each randomly selected song I selected 20 random timbre frames
in order to prevent any biases in data collection within each song In total there
were 422020 = 16800 timbre frames collected Next I clustered the timbre frames
using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to
100 and selecting the number of clusters with the lowest Bayes Information Criterion
(BIC) a statistical measure commonly used to calculate the best fitting model The
BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters
and saved the mean values of each of the 12 timbre segments for each cluster formed
In the same way that every song had the same 192 cord changes whose frequencies
could be compared between songs each song now had the same 46 timbre clusters
but different frequencies in each song When reading in the metadata from each song
I calculated the most likely timbre cluster each timbre frame belonged to and kept
25
Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear
a frequency count of all of the possible timbre clusters observed in a song Finally
as with the pitch data I divided all observed counts by the duration of the song in
order to normalize each songrsquos timbre counts
26
Chapter 3
Results
31 Methodology
After the pitch and timbre data was processed I ran the Dirichlet Process on the
data For each song I concatenated the 192-element chord change frequency list
and the 46-element timbre category frequency list giving each song a total of 238
features However there is a problem with this setup The pitch data will inherently
dominate the clustering process since it contains almost 3 times as many features
as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to
give different weights to each feature I considered another possibility to remedy
this discrepancy duplicating the timbre vector a certain number of times and
concatenating that to the feature set of each song While this strategy runs the risk
of corrupting the feature set and turning it into something that does not accurately
represent each song it is important to keep in mind that even without duplicating
the timbre vector the feature set consists of two separate feature sets concatenated
to each other Therefore timbre duplication appears to be a reasonable strategy to
weigh pitch and timbre more evenly
27
After this modification I tweaked a few more parameters before obtaining my
final results Dividing the pitch and timbre frequencies by the duration of the song
normalized every song to frequency per second but it also had the undesired effect
of making the data too small Timbre and pitch frequencies per second were almost
always less than 10 and many times hovered as low as 0002 for nonzero values
Because all of the values were very close to each other using common values of α
in the range of 01 to 1000-2000 was insufficient to push the songs into different
clusters As a result every song fell into the same cluster Increasing the value
of α by several orders of magnitude to well over 10 million fixed the problem but
this solution presented two problems First tuning α to experiment with different
ways to cluster the music would be problematic since I would have to work with
an enormous range of possible values for α Second pushing α to such high values
is not appropriate for the Dirichlet Process Extremely high values of α indicate a
Dirichlet Process that will try to disperse the data into different clusters but a value
of α that high is in principle always assigning each new song to a new cluster On
the other hand varying α between 01 and 1000 for example presents a much wider
range of flexibility when assigning clusters While this may be possible by varying
the values of α an extreme amount with the data as it currently is we are using
the Dirichlet Process in a way it should mathematically not be used Therefore
multiplying all of the data by a constant value so that we can work in the appropriate
range of α is the ideal approach After some experimentation I found that k=10 was
an appropriate scaling factor After initial runs of the Dirichlet Process I found out
that there was a slight issue with some of the earlier songs Since I had only artist
genre tags not specific song tags for each song I chose songs based on whether any
of the tags associated with the artist fell under any electronic music genre including
the generic term rsquoelectronicrsquo There were some bands mostly older ones from the
1960s and 1970s like Electric Light Orchestra which had some electronic music but
28
mostly featured rock funk disco or another genre Given that these artists featured
mostly non-electronic songs I decided to exclude them from my study and generate
a blacklist indicating these music artists While it was infeasible to look through
every single song and determine whether it was electronic or not I was able to look
over the earliest songs in each cluster These songs were the most important to verify
as electronic because early non-electronic songs could end up forming new clusters
and inadvertently create clusters with non-electronic sounds that I was not looking for
The goal of this thesis is to identify different groups in which EM songs are
clustered and identify the most unique artists and genres While the second task is
very simple because it requires looking at the earliest songs in each cluster the first
is difficult to gauge the effectiveness of While I can look at the average chord change
and timbre category frequencies in each category as well as other metadata putting
more semantic interpretations to what the music actually sounds like and determining
whether the music is clustered properly is a very subjective process For this reason
I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and
compared the clustering in each category examining similarities and differences in
the clusters formed in each scenario in the Discussion section For each value α I
set the upper limit of components or clusters allowed to 50 The ranges of α I used
resulted in 9 14 and 19 clusters formed
32 Findings
321 α=005
When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are
the distribution of years of the songs in each cluster (note that the Dirichlet Process
29
does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are
skipped)
30
Figure 31 Song year distributions for α = 005
For each value of α I also calculated the average frequency of each chord change
category and timbre category for each cluster and plotted the results The green
lines correspond to timbre and the blue lines to pitch
31
Figure 32 Timbre and pitch distributions for α = 005
A table of each cluster formed the number of songs in that cluster and descriptions
of pitch timbre and rhythmic qualities characteristic of songs in that cluster are
shown below
32
Cluster Song Count Characteristic Sounds
0 6481 Minimalist industrial space sounds dissonant chords
1 5482 Soft New Age ethereal
2 2405 Defined sounds electronic and non-electronic instru-
ments played in standard rock rhythms
3 360 Very dense and complex synths slightly darker tone
4 4550 Heavily distorted rock and synthesizer
6 2854 Faster paced 80s synth rock acid house
8 798 Aggressive beats dense house music
9 1464 Ambient house trancelike strong beats mysterious
tone
11 1597 Melancholy tones New wave rock in 80s then starting
in 90s downtempo trip-hop nu-metal
Table 31 Song cluster descriptions for α = 005
322 α=01
A total of 14 clusters were formed (16 were formed but 2 clusters contained only
one song each I listened to both of these songs and they did not sound unique so
I discarded them from the clusters) Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
33
34
Figure 33 Song year distributions for α = 01
35
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α
we need to ensure that each cluster is distinct in a way that is statistically and
intuitively logical In the Million Song Dataset [9] each song is represented as a
JSON object containing several fields These fields are candidate features to be used
in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo
by Rick Astley and the corresponding features
artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID
for this artists is db9)
artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)
artist_mbtags_count shape = (4)
(raw tag count of the 4 tags this artist received on musicbrainzorg)
artist_name Rick Astley (artist name)
artist_playmeid 1338 (the ID of that artist on the service playmecom)
artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)
artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest
(number between 0 and 1))
artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest
(number between 0 and 1))
15
audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for
the analysis by The Echo Nest)
bars_confidence shape = (99) (confidence value (between 0 and 1) associated
with each bar by The Echo Nest)
bars_start shape = (99) (start time of each bar according to The Echo Nest this
song has 99 bars)
beats_confidence shape = (397))
confidence value (between 0 and 1) associated with each beat by The Echo Nest
beats_start shape = (397) (start time of each beat according to The Echo Nest
this song has 397 beats)
danceability 00 (danceability measure of this song according to The Echo Nest
(between 0 and 1 0 =gt not analyzed))
duration 21169587 (duration of the track in seconds)
end_of_fade_in 0139 (time of the end of the fade in at the beginning of the
song according to The Echo Nest)
energy 00 (energy measure (not in the signal processing sense) according to The
Echo Nest (between 0 and 1 0 = not analyzed))
key 1 (estimation of the key the song is in by The Echo Nest)
key_confidence 0324 (confidence of the key estimation)
loudness -775 (general loudness of the track)
mode 1 (estimation of the mode the song is in by The Echo Nest)
mode_confidence 0434 (confidence of the mode estimation)
release Big Tunes - Back 2 The 80s (album name from which the track was taken
some songs tracks can come from many albums we give only one)
release_7digitalid 786795 (the ID of the release (album) on the service 7digi-
talcom)
sections_confidence shape = (10) (confidence value (between 0 and 1) associated
16
with each section by The Echo Nest)
sections_start shape = (10) (start time of each section according to The Echo
Nest this song has 10 sections)
segments_confidence shape = (935) (confidence value (between 0 and 1) asso-
ciated with each segment by The Echo Nest)
segments_loudness_max shape = (935) (max loudness during each segment)
segments_loudness_max_time shape = (935) (time of the max loudness
during each segment)
segments_loudness_start shape = (935) (loudness at the beginning of each
segment)
segments_pitches shape = (935 12) (chroma features for each segment (normal-
ized so max is 1))
segments_start shape = (935) (start time of each segment ( musical event or
onset) according to The Echo Nest this song has 935 segments)
segments_timbre shape = (935 12) (MFCC-like features for each segment)
similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar
to Rick Astley according to The Echo Nest)
song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded
(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))
song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can
be associated with many tracks (with very slight audio differences))
start_of _fade _out 198536 (start time of the fade out in seconds at the end
of the song according to The Echo Nest)
tatums_confidence shape = (794) (confidence value (between 0 and 1) associated
with each tatum by The Echo Nest)
tatums_start shape = (794) (start time of each tatum according to The Echo
Nest this song has 794 tatums)
17
tempo 113359 (tempo in BPM according to The Echo Nest)
time_signature 4 (time signature of the song according to The Echo Nest ie
usual number of beats per bar)
time_signature_confidence 0634 (confidence of the time signature estimation)
title Never Gonna Give You Up (song title)
track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)
track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track
on which the analysis was done) year 1987 (year when this song was released
according to musicbrainzorg)
When choosing features my main goal was to use features that would most
likely yield meaningful results yet also be simple and make sense to the average
person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener
will have his or her opinions to what constitutes different types of music but some
common features most people tend to differentiate songs by are pitch rhythm and
the types of instruments used The following specific fields provided in each song
object fall under these three terms
Pitch
bull segments_pitches a matrix of values indicating the strength of each pitch (or
note) at each discernible time interval
Rhythm
bull beats_start a vector of values indicating the start time of each beat
bull time_signature the time signature of the song
bull tempo the speed of the song in Beats Per Minute (BPM)
Instruments
18
bull segments_timbre a matrix of values indicating the distribution of MFCC-like
features (different types of tones) for each segments
The segments_pitches feature is a clear candidate for a differentiating factor for
songs since it reveals patterns of notes that occur Additionally other research
papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a
procedure that allows all songs to be compared with the same metric Likewise
timbre is intuitively a reliable differentiating feature since it reveals the amount
that different tones or sounds that sound different despite having the same pitch
Therefore segments_timbre is another feature that is considered in each song
Finally we look at the candidate features for rhythm At first glance all of these
features appear to be useful as they indicate the rhythm of a song in one way or
another However none of these features are as useful as the pitch and timbre
features While tempo is one factor in differentiating genres of EDM and music in
general tempo alone is not a driving force of musical innovation Certain genres
of EDM like drum nrsquo bass and happycore stand out for having very fast tempos
but the tempo is supplemented with a sound unique to the genre Conceiving new
arrangements of pitches combining instruments in new ways and inventing new
types of sounds are novel but speeding up or slowing down existing sounds is not
Including tempo as a feature could actually add noise to the model since many genres
overlap in their tempos And finally tempo is measured indirectly when the pitch
and timbre features are normalized for each song everything is measured in units of
ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features
each second Time signature can be dismissed from the candidates features for the
same reason as tempo many genres contain the same time signature and including
it in the feature set would only add more noise beats_start looks like a more
promising feature since like segments_pitches and segments_timbre it consists of
a vector of values However difficulties arise when we begin to think how exactly
19
we can utilize this information Since each song varies in length we need a way to
compare songs of different durations on the same level One approach could be to
perform basic statistics on the distance between each beat for example calculating
the mean and standard deviation of this distance However the normalized pitch
and timbre information already capture this data Another possibility is detecting
certain patterns of beats which could differentiate the syncopated dubstep or glitch
music beats from the steady pulse of electro-house But once again every beat is
accompanied by a sound with a specific timbre and pitch so this feature would not
add any significantly new information
23 Collecting Data and Preprocessing Selected Fea-
tures
231 Collecting the Data
Upon deciding the features I wanted to use in my research I first needed to collect
all of the electronic songs in the Million Song Dataset The easiest reliable way to
achieve this was to iterate through each song in the database and save the information
for the songs where any of the artist genre tags in artist_mbtags matched with an
electronic music genre While this measure was not fully accurate because it looks at
the genre of the artist not the song specific genre information for each song was not
as easily accessible so this indicator was nearly as good a substitute To generate a
list of the genres that electronic songs would fall under I manually searched through
a subset of the MSD to find all genres that seemed to be releated to electronic music
In the case of genres that were sometimes but not always electronic in nature such
20
as disco or pop I erred on the side of caution and did not include them in the list
of electronic genres In these cases false positives such as primarily rock songs that
happen to have the disco label attached to the artist could inadvertantly be included
in the dataset The final list of genres is as follows
target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo
rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo
rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo
rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
rsquodance and electronicarsquorsquoelectronicrsquo]
232 Pitch Preprocessing
A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-
cally informed manner The study first takes the raw sound data and converts it into
a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest
amount Then it computes the most likely chord by comparing comparing the 4 most
common types of chords in popular music (major minor dominant 7 and minor 7) to
the observed chord The most common chords are represented as ldquotemplate chordsrdquo
and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For
example using the note C as the first index the C major chord is represented as
CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)
For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is
computed over every template chord
ρCTc =12sumi=1
(CT minus CTi)(ci minus c)σCTσc
21
where CT is the mean of the values in the template chord σCT is the standard
deviation of the values in the chord and the operations on c are analogous Note
that the summation is over each individual pitch in the 12 pitch classes The chord
template with the highest value of ρ is selected as the chord for the time frame
After this is performed for each time frame the values are smoothed and then the
change between adjacent chords is observed The reasoning behind this step is that
by measuring the relative distance between chords rather than the chords themselves
all songs can be compared in the same manner even though they may have different
key signatures Finally the study takes the types of chord changes and classifies
them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted
versions of the chord changes that make more sense to a human such as ldquochanges
involving dominant 7th chordsrdquo
In my preliminary implementation of this method on an electronic dance music
corpus I made a few modifications to Mauchrsquos study First I smoothed out time
frames before computing the most probable chords rather than smoothing the most
probable chords I did this to save time and to reduce volatility in the chord
measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference
which contains 935 time frames and lasts 212 seconds 5 time frames is slightly
under 1 second and for preliminary testing appeared to be a good interval for each
time block Second as mentioned in the literature section I did not abstract the
chord changes into H-topics This decision also stemmed from time constraints since
deriving semantic chord meaning from EDM songs would require careful research
into the types of harmonies and sounds common in that genre of music Below I
included a high-level visualization of the pitch metadata found in a sample song
ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change
vector that I could then feed into the Dirichlet Process algorithm
22
Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813
CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513
DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913
FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713
GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213
AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513
13 13 13 13
060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013
13 13 13 13 13 13 13 13 13 13 13 13
13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13
Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13
Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13
Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13
Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13
23
13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13
13
13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13
Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13
24
A final step I took to normalize the chord change data was to divide the numbers by
the length of the song so that each songrsquos number of chord changes was measured per
second
233 Timbre Preprocessing
For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre
uniformly across all songs [8] After collecting all song metadata I took a random
sample of 20 songs from each year starting at 1970 The reason I forced the sampling
to 20 randomly sampled songs from each year and did not take a random sample of
songs from all years at once was to prevent bias towards any type of sounds As seen
in figure 22 there are significantly more songs from 2000-2011 than before 2000 The
mean year is x = 2001052 the median year is 2003 and the standard deviation of the
years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include
a disproportionate amount of more recent songs In order to not miss out on sounds
that may be more prevalent in older songs I required a set number of songs from each
year Next from each randomly selected song I selected 20 random timbre frames
in order to prevent any biases in data collection within each song In total there
were 422020 = 16800 timbre frames collected Next I clustered the timbre frames
using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to
100 and selecting the number of clusters with the lowest Bayes Information Criterion
(BIC) a statistical measure commonly used to calculate the best fitting model The
BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters
and saved the mean values of each of the 12 timbre segments for each cluster formed
In the same way that every song had the same 192 cord changes whose frequencies
could be compared between songs each song now had the same 46 timbre clusters
but different frequencies in each song When reading in the metadata from each song
I calculated the most likely timbre cluster each timbre frame belonged to and kept
25
Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear
a frequency count of all of the possible timbre clusters observed in a song Finally
as with the pitch data I divided all observed counts by the duration of the song in
order to normalize each songrsquos timbre counts
26
Chapter 3
Results
31 Methodology
After the pitch and timbre data was processed I ran the Dirichlet Process on the
data For each song I concatenated the 192-element chord change frequency list
and the 46-element timbre category frequency list giving each song a total of 238
features However there is a problem with this setup The pitch data will inherently
dominate the clustering process since it contains almost 3 times as many features
as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to
give different weights to each feature I considered another possibility to remedy
this discrepancy duplicating the timbre vector a certain number of times and
concatenating that to the feature set of each song While this strategy runs the risk
of corrupting the feature set and turning it into something that does not accurately
represent each song it is important to keep in mind that even without duplicating
the timbre vector the feature set consists of two separate feature sets concatenated
to each other Therefore timbre duplication appears to be a reasonable strategy to
weigh pitch and timbre more evenly
27
After this modification I tweaked a few more parameters before obtaining my
final results Dividing the pitch and timbre frequencies by the duration of the song
normalized every song to frequency per second but it also had the undesired effect
of making the data too small Timbre and pitch frequencies per second were almost
always less than 10 and many times hovered as low as 0002 for nonzero values
Because all of the values were very close to each other using common values of α
in the range of 01 to 1000-2000 was insufficient to push the songs into different
clusters As a result every song fell into the same cluster Increasing the value
of α by several orders of magnitude to well over 10 million fixed the problem but
this solution presented two problems First tuning α to experiment with different
ways to cluster the music would be problematic since I would have to work with
an enormous range of possible values for α Second pushing α to such high values
is not appropriate for the Dirichlet Process Extremely high values of α indicate a
Dirichlet Process that will try to disperse the data into different clusters but a value
of α that high is in principle always assigning each new song to a new cluster On
the other hand varying α between 01 and 1000 for example presents a much wider
range of flexibility when assigning clusters While this may be possible by varying
the values of α an extreme amount with the data as it currently is we are using
the Dirichlet Process in a way it should mathematically not be used Therefore
multiplying all of the data by a constant value so that we can work in the appropriate
range of α is the ideal approach After some experimentation I found that k=10 was
an appropriate scaling factor After initial runs of the Dirichlet Process I found out
that there was a slight issue with some of the earlier songs Since I had only artist
genre tags not specific song tags for each song I chose songs based on whether any
of the tags associated with the artist fell under any electronic music genre including
the generic term rsquoelectronicrsquo There were some bands mostly older ones from the
1960s and 1970s like Electric Light Orchestra which had some electronic music but
28
mostly featured rock funk disco or another genre Given that these artists featured
mostly non-electronic songs I decided to exclude them from my study and generate
a blacklist indicating these music artists While it was infeasible to look through
every single song and determine whether it was electronic or not I was able to look
over the earliest songs in each cluster These songs were the most important to verify
as electronic because early non-electronic songs could end up forming new clusters
and inadvertently create clusters with non-electronic sounds that I was not looking for
The goal of this thesis is to identify different groups in which EM songs are
clustered and identify the most unique artists and genres While the second task is
very simple because it requires looking at the earliest songs in each cluster the first
is difficult to gauge the effectiveness of While I can look at the average chord change
and timbre category frequencies in each category as well as other metadata putting
more semantic interpretations to what the music actually sounds like and determining
whether the music is clustered properly is a very subjective process For this reason
I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and
compared the clustering in each category examining similarities and differences in
the clusters formed in each scenario in the Discussion section For each value α I
set the upper limit of components or clusters allowed to 50 The ranges of α I used
resulted in 9 14 and 19 clusters formed
32 Findings
321 α=005
When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are
the distribution of years of the songs in each cluster (note that the Dirichlet Process
29
does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are
skipped)
30
Figure 31 Song year distributions for α = 005
For each value of α I also calculated the average frequency of each chord change
category and timbre category for each cluster and plotted the results The green
lines correspond to timbre and the blue lines to pitch
31
Figure 32 Timbre and pitch distributions for α = 005
A table of each cluster formed the number of songs in that cluster and descriptions
of pitch timbre and rhythmic qualities characteristic of songs in that cluster are
shown below
32
Cluster Song Count Characteristic Sounds
0 6481 Minimalist industrial space sounds dissonant chords
1 5482 Soft New Age ethereal
2 2405 Defined sounds electronic and non-electronic instru-
ments played in standard rock rhythms
3 360 Very dense and complex synths slightly darker tone
4 4550 Heavily distorted rock and synthesizer
6 2854 Faster paced 80s synth rock acid house
8 798 Aggressive beats dense house music
9 1464 Ambient house trancelike strong beats mysterious
tone
11 1597 Melancholy tones New wave rock in 80s then starting
in 90s downtempo trip-hop nu-metal
Table 31 Song cluster descriptions for α = 005
322 α=01
A total of 14 clusters were formed (16 were formed but 2 clusters contained only
one song each I listened to both of these songs and they did not sound unique so
I discarded them from the clusters) Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
33
34
Figure 33 Song year distributions for α = 01
35
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for
the analysis by The Echo Nest)
bars_confidence shape = (99) (confidence value (between 0 and 1) associated
with each bar by The Echo Nest)
bars_start shape = (99) (start time of each bar according to The Echo Nest this
song has 99 bars)
beats_confidence shape = (397))
confidence value (between 0 and 1) associated with each beat by The Echo Nest
beats_start shape = (397) (start time of each beat according to The Echo Nest
this song has 397 beats)
danceability 00 (danceability measure of this song according to The Echo Nest
(between 0 and 1 0 =gt not analyzed))
duration 21169587 (duration of the track in seconds)
end_of_fade_in 0139 (time of the end of the fade in at the beginning of the
song according to The Echo Nest)
energy 00 (energy measure (not in the signal processing sense) according to The
Echo Nest (between 0 and 1 0 = not analyzed))
key 1 (estimation of the key the song is in by The Echo Nest)
key_confidence 0324 (confidence of the key estimation)
loudness -775 (general loudness of the track)
mode 1 (estimation of the mode the song is in by The Echo Nest)
mode_confidence 0434 (confidence of the mode estimation)
release Big Tunes - Back 2 The 80s (album name from which the track was taken
some songs tracks can come from many albums we give only one)
release_7digitalid 786795 (the ID of the release (album) on the service 7digi-
talcom)
sections_confidence shape = (10) (confidence value (between 0 and 1) associated
16
with each section by The Echo Nest)
sections_start shape = (10) (start time of each section according to The Echo
Nest this song has 10 sections)
segments_confidence shape = (935) (confidence value (between 0 and 1) asso-
ciated with each segment by The Echo Nest)
segments_loudness_max shape = (935) (max loudness during each segment)
segments_loudness_max_time shape = (935) (time of the max loudness
during each segment)
segments_loudness_start shape = (935) (loudness at the beginning of each
segment)
segments_pitches shape = (935 12) (chroma features for each segment (normal-
ized so max is 1))
segments_start shape = (935) (start time of each segment ( musical event or
onset) according to The Echo Nest this song has 935 segments)
segments_timbre shape = (935 12) (MFCC-like features for each segment)
similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar
to Rick Astley according to The Echo Nest)
song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded
(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))
song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can
be associated with many tracks (with very slight audio differences))
start_of _fade _out 198536 (start time of the fade out in seconds at the end
of the song according to The Echo Nest)
tatums_confidence shape = (794) (confidence value (between 0 and 1) associated
with each tatum by The Echo Nest)
tatums_start shape = (794) (start time of each tatum according to The Echo
Nest this song has 794 tatums)
17
tempo 113359 (tempo in BPM according to The Echo Nest)
time_signature 4 (time signature of the song according to The Echo Nest ie
usual number of beats per bar)
time_signature_confidence 0634 (confidence of the time signature estimation)
title Never Gonna Give You Up (song title)
track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)
track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track
on which the analysis was done) year 1987 (year when this song was released
according to musicbrainzorg)
When choosing features my main goal was to use features that would most
likely yield meaningful results yet also be simple and make sense to the average
person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener
will have his or her opinions to what constitutes different types of music but some
common features most people tend to differentiate songs by are pitch rhythm and
the types of instruments used The following specific fields provided in each song
object fall under these three terms
Pitch
bull segments_pitches a matrix of values indicating the strength of each pitch (or
note) at each discernible time interval
Rhythm
bull beats_start a vector of values indicating the start time of each beat
bull time_signature the time signature of the song
bull tempo the speed of the song in Beats Per Minute (BPM)
Instruments
18
bull segments_timbre a matrix of values indicating the distribution of MFCC-like
features (different types of tones) for each segments
The segments_pitches feature is a clear candidate for a differentiating factor for
songs since it reveals patterns of notes that occur Additionally other research
papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a
procedure that allows all songs to be compared with the same metric Likewise
timbre is intuitively a reliable differentiating feature since it reveals the amount
that different tones or sounds that sound different despite having the same pitch
Therefore segments_timbre is another feature that is considered in each song
Finally we look at the candidate features for rhythm At first glance all of these
features appear to be useful as they indicate the rhythm of a song in one way or
another However none of these features are as useful as the pitch and timbre
features While tempo is one factor in differentiating genres of EDM and music in
general tempo alone is not a driving force of musical innovation Certain genres
of EDM like drum nrsquo bass and happycore stand out for having very fast tempos
but the tempo is supplemented with a sound unique to the genre Conceiving new
arrangements of pitches combining instruments in new ways and inventing new
types of sounds are novel but speeding up or slowing down existing sounds is not
Including tempo as a feature could actually add noise to the model since many genres
overlap in their tempos And finally tempo is measured indirectly when the pitch
and timbre features are normalized for each song everything is measured in units of
ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features
each second Time signature can be dismissed from the candidates features for the
same reason as tempo many genres contain the same time signature and including
it in the feature set would only add more noise beats_start looks like a more
promising feature since like segments_pitches and segments_timbre it consists of
a vector of values However difficulties arise when we begin to think how exactly
19
we can utilize this information Since each song varies in length we need a way to
compare songs of different durations on the same level One approach could be to
perform basic statistics on the distance between each beat for example calculating
the mean and standard deviation of this distance However the normalized pitch
and timbre information already capture this data Another possibility is detecting
certain patterns of beats which could differentiate the syncopated dubstep or glitch
music beats from the steady pulse of electro-house But once again every beat is
accompanied by a sound with a specific timbre and pitch so this feature would not
add any significantly new information
23 Collecting Data and Preprocessing Selected Fea-
tures
231 Collecting the Data
Upon deciding the features I wanted to use in my research I first needed to collect
all of the electronic songs in the Million Song Dataset The easiest reliable way to
achieve this was to iterate through each song in the database and save the information
for the songs where any of the artist genre tags in artist_mbtags matched with an
electronic music genre While this measure was not fully accurate because it looks at
the genre of the artist not the song specific genre information for each song was not
as easily accessible so this indicator was nearly as good a substitute To generate a
list of the genres that electronic songs would fall under I manually searched through
a subset of the MSD to find all genres that seemed to be releated to electronic music
In the case of genres that were sometimes but not always electronic in nature such
20
as disco or pop I erred on the side of caution and did not include them in the list
of electronic genres In these cases false positives such as primarily rock songs that
happen to have the disco label attached to the artist could inadvertantly be included
in the dataset The final list of genres is as follows
target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo
rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo
rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo
rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
rsquodance and electronicarsquorsquoelectronicrsquo]
232 Pitch Preprocessing
A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-
cally informed manner The study first takes the raw sound data and converts it into
a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest
amount Then it computes the most likely chord by comparing comparing the 4 most
common types of chords in popular music (major minor dominant 7 and minor 7) to
the observed chord The most common chords are represented as ldquotemplate chordsrdquo
and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For
example using the note C as the first index the C major chord is represented as
CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)
For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is
computed over every template chord
ρCTc =12sumi=1
(CT minus CTi)(ci minus c)σCTσc
21
where CT is the mean of the values in the template chord σCT is the standard
deviation of the values in the chord and the operations on c are analogous Note
that the summation is over each individual pitch in the 12 pitch classes The chord
template with the highest value of ρ is selected as the chord for the time frame
After this is performed for each time frame the values are smoothed and then the
change between adjacent chords is observed The reasoning behind this step is that
by measuring the relative distance between chords rather than the chords themselves
all songs can be compared in the same manner even though they may have different
key signatures Finally the study takes the types of chord changes and classifies
them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted
versions of the chord changes that make more sense to a human such as ldquochanges
involving dominant 7th chordsrdquo
In my preliminary implementation of this method on an electronic dance music
corpus I made a few modifications to Mauchrsquos study First I smoothed out time
frames before computing the most probable chords rather than smoothing the most
probable chords I did this to save time and to reduce volatility in the chord
measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference
which contains 935 time frames and lasts 212 seconds 5 time frames is slightly
under 1 second and for preliminary testing appeared to be a good interval for each
time block Second as mentioned in the literature section I did not abstract the
chord changes into H-topics This decision also stemmed from time constraints since
deriving semantic chord meaning from EDM songs would require careful research
into the types of harmonies and sounds common in that genre of music Below I
included a high-level visualization of the pitch metadata found in a sample song
ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change
vector that I could then feed into the Dirichlet Process algorithm
22
Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813
CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513
DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913
FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713
GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213
AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513
13 13 13 13
060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013
13 13 13 13 13 13 13 13 13 13 13 13
13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13
Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13
Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13
Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13
Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13
23
13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13
13
13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13
Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13
24
A final step I took to normalize the chord change data was to divide the numbers by
the length of the song so that each songrsquos number of chord changes was measured per
second
233 Timbre Preprocessing
For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre
uniformly across all songs [8] After collecting all song metadata I took a random
sample of 20 songs from each year starting at 1970 The reason I forced the sampling
to 20 randomly sampled songs from each year and did not take a random sample of
songs from all years at once was to prevent bias towards any type of sounds As seen
in figure 22 there are significantly more songs from 2000-2011 than before 2000 The
mean year is x = 2001052 the median year is 2003 and the standard deviation of the
years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include
a disproportionate amount of more recent songs In order to not miss out on sounds
that may be more prevalent in older songs I required a set number of songs from each
year Next from each randomly selected song I selected 20 random timbre frames
in order to prevent any biases in data collection within each song In total there
were 422020 = 16800 timbre frames collected Next I clustered the timbre frames
using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to
100 and selecting the number of clusters with the lowest Bayes Information Criterion
(BIC) a statistical measure commonly used to calculate the best fitting model The
BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters
and saved the mean values of each of the 12 timbre segments for each cluster formed
In the same way that every song had the same 192 cord changes whose frequencies
could be compared between songs each song now had the same 46 timbre clusters
but different frequencies in each song When reading in the metadata from each song
I calculated the most likely timbre cluster each timbre frame belonged to and kept
25
Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear
a frequency count of all of the possible timbre clusters observed in a song Finally
as with the pitch data I divided all observed counts by the duration of the song in
order to normalize each songrsquos timbre counts
26
Chapter 3
Results
31 Methodology
After the pitch and timbre data was processed I ran the Dirichlet Process on the
data For each song I concatenated the 192-element chord change frequency list
and the 46-element timbre category frequency list giving each song a total of 238
features However there is a problem with this setup The pitch data will inherently
dominate the clustering process since it contains almost 3 times as many features
as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to
give different weights to each feature I considered another possibility to remedy
this discrepancy duplicating the timbre vector a certain number of times and
concatenating that to the feature set of each song While this strategy runs the risk
of corrupting the feature set and turning it into something that does not accurately
represent each song it is important to keep in mind that even without duplicating
the timbre vector the feature set consists of two separate feature sets concatenated
to each other Therefore timbre duplication appears to be a reasonable strategy to
weigh pitch and timbre more evenly
27
After this modification I tweaked a few more parameters before obtaining my
final results Dividing the pitch and timbre frequencies by the duration of the song
normalized every song to frequency per second but it also had the undesired effect
of making the data too small Timbre and pitch frequencies per second were almost
always less than 10 and many times hovered as low as 0002 for nonzero values
Because all of the values were very close to each other using common values of α
in the range of 01 to 1000-2000 was insufficient to push the songs into different
clusters As a result every song fell into the same cluster Increasing the value
of α by several orders of magnitude to well over 10 million fixed the problem but
this solution presented two problems First tuning α to experiment with different
ways to cluster the music would be problematic since I would have to work with
an enormous range of possible values for α Second pushing α to such high values
is not appropriate for the Dirichlet Process Extremely high values of α indicate a
Dirichlet Process that will try to disperse the data into different clusters but a value
of α that high is in principle always assigning each new song to a new cluster On
the other hand varying α between 01 and 1000 for example presents a much wider
range of flexibility when assigning clusters While this may be possible by varying
the values of α an extreme amount with the data as it currently is we are using
the Dirichlet Process in a way it should mathematically not be used Therefore
multiplying all of the data by a constant value so that we can work in the appropriate
range of α is the ideal approach After some experimentation I found that k=10 was
an appropriate scaling factor After initial runs of the Dirichlet Process I found out
that there was a slight issue with some of the earlier songs Since I had only artist
genre tags not specific song tags for each song I chose songs based on whether any
of the tags associated with the artist fell under any electronic music genre including
the generic term rsquoelectronicrsquo There were some bands mostly older ones from the
1960s and 1970s like Electric Light Orchestra which had some electronic music but
28
mostly featured rock funk disco or another genre Given that these artists featured
mostly non-electronic songs I decided to exclude them from my study and generate
a blacklist indicating these music artists While it was infeasible to look through
every single song and determine whether it was electronic or not I was able to look
over the earliest songs in each cluster These songs were the most important to verify
as electronic because early non-electronic songs could end up forming new clusters
and inadvertently create clusters with non-electronic sounds that I was not looking for
The goal of this thesis is to identify different groups in which EM songs are
clustered and identify the most unique artists and genres While the second task is
very simple because it requires looking at the earliest songs in each cluster the first
is difficult to gauge the effectiveness of While I can look at the average chord change
and timbre category frequencies in each category as well as other metadata putting
more semantic interpretations to what the music actually sounds like and determining
whether the music is clustered properly is a very subjective process For this reason
I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and
compared the clustering in each category examining similarities and differences in
the clusters formed in each scenario in the Discussion section For each value α I
set the upper limit of components or clusters allowed to 50 The ranges of α I used
resulted in 9 14 and 19 clusters formed
32 Findings
321 α=005
When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are
the distribution of years of the songs in each cluster (note that the Dirichlet Process
29
does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are
skipped)
30
Figure 31 Song year distributions for α = 005
For each value of α I also calculated the average frequency of each chord change
category and timbre category for each cluster and plotted the results The green
lines correspond to timbre and the blue lines to pitch
31
Figure 32 Timbre and pitch distributions for α = 005
A table of each cluster formed the number of songs in that cluster and descriptions
of pitch timbre and rhythmic qualities characteristic of songs in that cluster are
shown below
32
Cluster Song Count Characteristic Sounds
0 6481 Minimalist industrial space sounds dissonant chords
1 5482 Soft New Age ethereal
2 2405 Defined sounds electronic and non-electronic instru-
ments played in standard rock rhythms
3 360 Very dense and complex synths slightly darker tone
4 4550 Heavily distorted rock and synthesizer
6 2854 Faster paced 80s synth rock acid house
8 798 Aggressive beats dense house music
9 1464 Ambient house trancelike strong beats mysterious
tone
11 1597 Melancholy tones New wave rock in 80s then starting
in 90s downtempo trip-hop nu-metal
Table 31 Song cluster descriptions for α = 005
322 α=01
A total of 14 clusters were formed (16 were formed but 2 clusters contained only
one song each I listened to both of these songs and they did not sound unique so
I discarded them from the clusters) Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
33
34
Figure 33 Song year distributions for α = 01
35
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
with each section by The Echo Nest)
sections_start shape = (10) (start time of each section according to The Echo
Nest this song has 10 sections)
segments_confidence shape = (935) (confidence value (between 0 and 1) asso-
ciated with each segment by The Echo Nest)
segments_loudness_max shape = (935) (max loudness during each segment)
segments_loudness_max_time shape = (935) (time of the max loudness
during each segment)
segments_loudness_start shape = (935) (loudness at the beginning of each
segment)
segments_pitches shape = (935 12) (chroma features for each segment (normal-
ized so max is 1))
segments_start shape = (935) (start time of each segment ( musical event or
onset) according to The Echo Nest this song has 935 segments)
segments_timbre shape = (935 12) (MFCC-like features for each segment)
similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar
to Rick Astley according to The Echo Nest)
song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded
(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))
song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can
be associated with many tracks (with very slight audio differences))
start_of _fade _out 198536 (start time of the fade out in seconds at the end
of the song according to The Echo Nest)
tatums_confidence shape = (794) (confidence value (between 0 and 1) associated
with each tatum by The Echo Nest)
tatums_start shape = (794) (start time of each tatum according to The Echo
Nest this song has 794 tatums)
17
tempo 113359 (tempo in BPM according to The Echo Nest)
time_signature 4 (time signature of the song according to The Echo Nest ie
usual number of beats per bar)
time_signature_confidence 0634 (confidence of the time signature estimation)
title Never Gonna Give You Up (song title)
track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)
track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track
on which the analysis was done) year 1987 (year when this song was released
according to musicbrainzorg)
When choosing features my main goal was to use features that would most
likely yield meaningful results yet also be simple and make sense to the average
person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener
will have his or her opinions to what constitutes different types of music but some
common features most people tend to differentiate songs by are pitch rhythm and
the types of instruments used The following specific fields provided in each song
object fall under these three terms
Pitch
bull segments_pitches a matrix of values indicating the strength of each pitch (or
note) at each discernible time interval
Rhythm
bull beats_start a vector of values indicating the start time of each beat
bull time_signature the time signature of the song
bull tempo the speed of the song in Beats Per Minute (BPM)
Instruments
18
bull segments_timbre a matrix of values indicating the distribution of MFCC-like
features (different types of tones) for each segments
The segments_pitches feature is a clear candidate for a differentiating factor for
songs since it reveals patterns of notes that occur Additionally other research
papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a
procedure that allows all songs to be compared with the same metric Likewise
timbre is intuitively a reliable differentiating feature since it reveals the amount
that different tones or sounds that sound different despite having the same pitch
Therefore segments_timbre is another feature that is considered in each song
Finally we look at the candidate features for rhythm At first glance all of these
features appear to be useful as they indicate the rhythm of a song in one way or
another However none of these features are as useful as the pitch and timbre
features While tempo is one factor in differentiating genres of EDM and music in
general tempo alone is not a driving force of musical innovation Certain genres
of EDM like drum nrsquo bass and happycore stand out for having very fast tempos
but the tempo is supplemented with a sound unique to the genre Conceiving new
arrangements of pitches combining instruments in new ways and inventing new
types of sounds are novel but speeding up or slowing down existing sounds is not
Including tempo as a feature could actually add noise to the model since many genres
overlap in their tempos And finally tempo is measured indirectly when the pitch
and timbre features are normalized for each song everything is measured in units of
ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features
each second Time signature can be dismissed from the candidates features for the
same reason as tempo many genres contain the same time signature and including
it in the feature set would only add more noise beats_start looks like a more
promising feature since like segments_pitches and segments_timbre it consists of
a vector of values However difficulties arise when we begin to think how exactly
19
we can utilize this information Since each song varies in length we need a way to
compare songs of different durations on the same level One approach could be to
perform basic statistics on the distance between each beat for example calculating
the mean and standard deviation of this distance However the normalized pitch
and timbre information already capture this data Another possibility is detecting
certain patterns of beats which could differentiate the syncopated dubstep or glitch
music beats from the steady pulse of electro-house But once again every beat is
accompanied by a sound with a specific timbre and pitch so this feature would not
add any significantly new information
23 Collecting Data and Preprocessing Selected Fea-
tures
231 Collecting the Data
Upon deciding the features I wanted to use in my research I first needed to collect
all of the electronic songs in the Million Song Dataset The easiest reliable way to
achieve this was to iterate through each song in the database and save the information
for the songs where any of the artist genre tags in artist_mbtags matched with an
electronic music genre While this measure was not fully accurate because it looks at
the genre of the artist not the song specific genre information for each song was not
as easily accessible so this indicator was nearly as good a substitute To generate a
list of the genres that electronic songs would fall under I manually searched through
a subset of the MSD to find all genres that seemed to be releated to electronic music
In the case of genres that were sometimes but not always electronic in nature such
20
as disco or pop I erred on the side of caution and did not include them in the list
of electronic genres In these cases false positives such as primarily rock songs that
happen to have the disco label attached to the artist could inadvertantly be included
in the dataset The final list of genres is as follows
target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo
rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo
rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo
rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
rsquodance and electronicarsquorsquoelectronicrsquo]
232 Pitch Preprocessing
A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-
cally informed manner The study first takes the raw sound data and converts it into
a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest
amount Then it computes the most likely chord by comparing comparing the 4 most
common types of chords in popular music (major minor dominant 7 and minor 7) to
the observed chord The most common chords are represented as ldquotemplate chordsrdquo
and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For
example using the note C as the first index the C major chord is represented as
CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)
For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is
computed over every template chord
ρCTc =12sumi=1
(CT minus CTi)(ci minus c)σCTσc
21
where CT is the mean of the values in the template chord σCT is the standard
deviation of the values in the chord and the operations on c are analogous Note
that the summation is over each individual pitch in the 12 pitch classes The chord
template with the highest value of ρ is selected as the chord for the time frame
After this is performed for each time frame the values are smoothed and then the
change between adjacent chords is observed The reasoning behind this step is that
by measuring the relative distance between chords rather than the chords themselves
all songs can be compared in the same manner even though they may have different
key signatures Finally the study takes the types of chord changes and classifies
them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted
versions of the chord changes that make more sense to a human such as ldquochanges
involving dominant 7th chordsrdquo
In my preliminary implementation of this method on an electronic dance music
corpus I made a few modifications to Mauchrsquos study First I smoothed out time
frames before computing the most probable chords rather than smoothing the most
probable chords I did this to save time and to reduce volatility in the chord
measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference
which contains 935 time frames and lasts 212 seconds 5 time frames is slightly
under 1 second and for preliminary testing appeared to be a good interval for each
time block Second as mentioned in the literature section I did not abstract the
chord changes into H-topics This decision also stemmed from time constraints since
deriving semantic chord meaning from EDM songs would require careful research
into the types of harmonies and sounds common in that genre of music Below I
included a high-level visualization of the pitch metadata found in a sample song
ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change
vector that I could then feed into the Dirichlet Process algorithm
22
Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813
CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513
DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913
FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713
GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213
AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513
13 13 13 13
060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013
13 13 13 13 13 13 13 13 13 13 13 13
13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13
Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13
Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13
Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13
Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13
23
13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13
13
13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13
Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13
24
A final step I took to normalize the chord change data was to divide the numbers by
the length of the song so that each songrsquos number of chord changes was measured per
second
233 Timbre Preprocessing
For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre
uniformly across all songs [8] After collecting all song metadata I took a random
sample of 20 songs from each year starting at 1970 The reason I forced the sampling
to 20 randomly sampled songs from each year and did not take a random sample of
songs from all years at once was to prevent bias towards any type of sounds As seen
in figure 22 there are significantly more songs from 2000-2011 than before 2000 The
mean year is x = 2001052 the median year is 2003 and the standard deviation of the
years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include
a disproportionate amount of more recent songs In order to not miss out on sounds
that may be more prevalent in older songs I required a set number of songs from each
year Next from each randomly selected song I selected 20 random timbre frames
in order to prevent any biases in data collection within each song In total there
were 422020 = 16800 timbre frames collected Next I clustered the timbre frames
using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to
100 and selecting the number of clusters with the lowest Bayes Information Criterion
(BIC) a statistical measure commonly used to calculate the best fitting model The
BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters
and saved the mean values of each of the 12 timbre segments for each cluster formed
In the same way that every song had the same 192 cord changes whose frequencies
could be compared between songs each song now had the same 46 timbre clusters
but different frequencies in each song When reading in the metadata from each song
I calculated the most likely timbre cluster each timbre frame belonged to and kept
25
Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear
a frequency count of all of the possible timbre clusters observed in a song Finally
as with the pitch data I divided all observed counts by the duration of the song in
order to normalize each songrsquos timbre counts
26
Chapter 3
Results
31 Methodology
After the pitch and timbre data was processed I ran the Dirichlet Process on the
data For each song I concatenated the 192-element chord change frequency list
and the 46-element timbre category frequency list giving each song a total of 238
features However there is a problem with this setup The pitch data will inherently
dominate the clustering process since it contains almost 3 times as many features
as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to
give different weights to each feature I considered another possibility to remedy
this discrepancy duplicating the timbre vector a certain number of times and
concatenating that to the feature set of each song While this strategy runs the risk
of corrupting the feature set and turning it into something that does not accurately
represent each song it is important to keep in mind that even without duplicating
the timbre vector the feature set consists of two separate feature sets concatenated
to each other Therefore timbre duplication appears to be a reasonable strategy to
weigh pitch and timbre more evenly
27
After this modification I tweaked a few more parameters before obtaining my
final results Dividing the pitch and timbre frequencies by the duration of the song
normalized every song to frequency per second but it also had the undesired effect
of making the data too small Timbre and pitch frequencies per second were almost
always less than 10 and many times hovered as low as 0002 for nonzero values
Because all of the values were very close to each other using common values of α
in the range of 01 to 1000-2000 was insufficient to push the songs into different
clusters As a result every song fell into the same cluster Increasing the value
of α by several orders of magnitude to well over 10 million fixed the problem but
this solution presented two problems First tuning α to experiment with different
ways to cluster the music would be problematic since I would have to work with
an enormous range of possible values for α Second pushing α to such high values
is not appropriate for the Dirichlet Process Extremely high values of α indicate a
Dirichlet Process that will try to disperse the data into different clusters but a value
of α that high is in principle always assigning each new song to a new cluster On
the other hand varying α between 01 and 1000 for example presents a much wider
range of flexibility when assigning clusters While this may be possible by varying
the values of α an extreme amount with the data as it currently is we are using
the Dirichlet Process in a way it should mathematically not be used Therefore
multiplying all of the data by a constant value so that we can work in the appropriate
range of α is the ideal approach After some experimentation I found that k=10 was
an appropriate scaling factor After initial runs of the Dirichlet Process I found out
that there was a slight issue with some of the earlier songs Since I had only artist
genre tags not specific song tags for each song I chose songs based on whether any
of the tags associated with the artist fell under any electronic music genre including
the generic term rsquoelectronicrsquo There were some bands mostly older ones from the
1960s and 1970s like Electric Light Orchestra which had some electronic music but
28
mostly featured rock funk disco or another genre Given that these artists featured
mostly non-electronic songs I decided to exclude them from my study and generate
a blacklist indicating these music artists While it was infeasible to look through
every single song and determine whether it was electronic or not I was able to look
over the earliest songs in each cluster These songs were the most important to verify
as electronic because early non-electronic songs could end up forming new clusters
and inadvertently create clusters with non-electronic sounds that I was not looking for
The goal of this thesis is to identify different groups in which EM songs are
clustered and identify the most unique artists and genres While the second task is
very simple because it requires looking at the earliest songs in each cluster the first
is difficult to gauge the effectiveness of While I can look at the average chord change
and timbre category frequencies in each category as well as other metadata putting
more semantic interpretations to what the music actually sounds like and determining
whether the music is clustered properly is a very subjective process For this reason
I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and
compared the clustering in each category examining similarities and differences in
the clusters formed in each scenario in the Discussion section For each value α I
set the upper limit of components or clusters allowed to 50 The ranges of α I used
resulted in 9 14 and 19 clusters formed
32 Findings
321 α=005
When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are
the distribution of years of the songs in each cluster (note that the Dirichlet Process
29
does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are
skipped)
30
Figure 31 Song year distributions for α = 005
For each value of α I also calculated the average frequency of each chord change
category and timbre category for each cluster and plotted the results The green
lines correspond to timbre and the blue lines to pitch
31
Figure 32 Timbre and pitch distributions for α = 005
A table of each cluster formed the number of songs in that cluster and descriptions
of pitch timbre and rhythmic qualities characteristic of songs in that cluster are
shown below
32
Cluster Song Count Characteristic Sounds
0 6481 Minimalist industrial space sounds dissonant chords
1 5482 Soft New Age ethereal
2 2405 Defined sounds electronic and non-electronic instru-
ments played in standard rock rhythms
3 360 Very dense and complex synths slightly darker tone
4 4550 Heavily distorted rock and synthesizer
6 2854 Faster paced 80s synth rock acid house
8 798 Aggressive beats dense house music
9 1464 Ambient house trancelike strong beats mysterious
tone
11 1597 Melancholy tones New wave rock in 80s then starting
in 90s downtempo trip-hop nu-metal
Table 31 Song cluster descriptions for α = 005
322 α=01
A total of 14 clusters were formed (16 were formed but 2 clusters contained only
one song each I listened to both of these songs and they did not sound unique so
I discarded them from the clusters) Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
33
34
Figure 33 Song year distributions for α = 01
35
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
tempo 113359 (tempo in BPM according to The Echo Nest)
time_signature 4 (time signature of the song according to The Echo Nest ie
usual number of beats per bar)
time_signature_confidence 0634 (confidence of the time signature estimation)
title Never Gonna Give You Up (song title)
track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)
track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track
on which the analysis was done) year 1987 (year when this song was released
according to musicbrainzorg)
When choosing features my main goal was to use features that would most
likely yield meaningful results yet also be simple and make sense to the average
person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener
will have his or her opinions to what constitutes different types of music but some
common features most people tend to differentiate songs by are pitch rhythm and
the types of instruments used The following specific fields provided in each song
object fall under these three terms
Pitch
bull segments_pitches a matrix of values indicating the strength of each pitch (or
note) at each discernible time interval
Rhythm
bull beats_start a vector of values indicating the start time of each beat
bull time_signature the time signature of the song
bull tempo the speed of the song in Beats Per Minute (BPM)
Instruments
18
bull segments_timbre a matrix of values indicating the distribution of MFCC-like
features (different types of tones) for each segments
The segments_pitches feature is a clear candidate for a differentiating factor for
songs since it reveals patterns of notes that occur Additionally other research
papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a
procedure that allows all songs to be compared with the same metric Likewise
timbre is intuitively a reliable differentiating feature since it reveals the amount
that different tones or sounds that sound different despite having the same pitch
Therefore segments_timbre is another feature that is considered in each song
Finally we look at the candidate features for rhythm At first glance all of these
features appear to be useful as they indicate the rhythm of a song in one way or
another However none of these features are as useful as the pitch and timbre
features While tempo is one factor in differentiating genres of EDM and music in
general tempo alone is not a driving force of musical innovation Certain genres
of EDM like drum nrsquo bass and happycore stand out for having very fast tempos
but the tempo is supplemented with a sound unique to the genre Conceiving new
arrangements of pitches combining instruments in new ways and inventing new
types of sounds are novel but speeding up or slowing down existing sounds is not
Including tempo as a feature could actually add noise to the model since many genres
overlap in their tempos And finally tempo is measured indirectly when the pitch
and timbre features are normalized for each song everything is measured in units of
ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features
each second Time signature can be dismissed from the candidates features for the
same reason as tempo many genres contain the same time signature and including
it in the feature set would only add more noise beats_start looks like a more
promising feature since like segments_pitches and segments_timbre it consists of
a vector of values However difficulties arise when we begin to think how exactly
19
we can utilize this information Since each song varies in length we need a way to
compare songs of different durations on the same level One approach could be to
perform basic statistics on the distance between each beat for example calculating
the mean and standard deviation of this distance However the normalized pitch
and timbre information already capture this data Another possibility is detecting
certain patterns of beats which could differentiate the syncopated dubstep or glitch
music beats from the steady pulse of electro-house But once again every beat is
accompanied by a sound with a specific timbre and pitch so this feature would not
add any significantly new information
23 Collecting Data and Preprocessing Selected Fea-
tures
231 Collecting the Data
Upon deciding the features I wanted to use in my research I first needed to collect
all of the electronic songs in the Million Song Dataset The easiest reliable way to
achieve this was to iterate through each song in the database and save the information
for the songs where any of the artist genre tags in artist_mbtags matched with an
electronic music genre While this measure was not fully accurate because it looks at
the genre of the artist not the song specific genre information for each song was not
as easily accessible so this indicator was nearly as good a substitute To generate a
list of the genres that electronic songs would fall under I manually searched through
a subset of the MSD to find all genres that seemed to be releated to electronic music
In the case of genres that were sometimes but not always electronic in nature such
20
as disco or pop I erred on the side of caution and did not include them in the list
of electronic genres In these cases false positives such as primarily rock songs that
happen to have the disco label attached to the artist could inadvertantly be included
in the dataset The final list of genres is as follows
target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo
rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo
rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo
rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
rsquodance and electronicarsquorsquoelectronicrsquo]
232 Pitch Preprocessing
A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-
cally informed manner The study first takes the raw sound data and converts it into
a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest
amount Then it computes the most likely chord by comparing comparing the 4 most
common types of chords in popular music (major minor dominant 7 and minor 7) to
the observed chord The most common chords are represented as ldquotemplate chordsrdquo
and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For
example using the note C as the first index the C major chord is represented as
CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)
For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is
computed over every template chord
ρCTc =12sumi=1
(CT minus CTi)(ci minus c)σCTσc
21
where CT is the mean of the values in the template chord σCT is the standard
deviation of the values in the chord and the operations on c are analogous Note
that the summation is over each individual pitch in the 12 pitch classes The chord
template with the highest value of ρ is selected as the chord for the time frame
After this is performed for each time frame the values are smoothed and then the
change between adjacent chords is observed The reasoning behind this step is that
by measuring the relative distance between chords rather than the chords themselves
all songs can be compared in the same manner even though they may have different
key signatures Finally the study takes the types of chord changes and classifies
them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted
versions of the chord changes that make more sense to a human such as ldquochanges
involving dominant 7th chordsrdquo
In my preliminary implementation of this method on an electronic dance music
corpus I made a few modifications to Mauchrsquos study First I smoothed out time
frames before computing the most probable chords rather than smoothing the most
probable chords I did this to save time and to reduce volatility in the chord
measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference
which contains 935 time frames and lasts 212 seconds 5 time frames is slightly
under 1 second and for preliminary testing appeared to be a good interval for each
time block Second as mentioned in the literature section I did not abstract the
chord changes into H-topics This decision also stemmed from time constraints since
deriving semantic chord meaning from EDM songs would require careful research
into the types of harmonies and sounds common in that genre of music Below I
included a high-level visualization of the pitch metadata found in a sample song
ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change
vector that I could then feed into the Dirichlet Process algorithm
22
Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813
CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513
DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913
FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713
GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213
AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513
13 13 13 13
060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013
13 13 13 13 13 13 13 13 13 13 13 13
13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13
Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13
Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13
Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13
Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13
23
13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13
13
13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13
Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13
24
A final step I took to normalize the chord change data was to divide the numbers by
the length of the song so that each songrsquos number of chord changes was measured per
second
233 Timbre Preprocessing
For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre
uniformly across all songs [8] After collecting all song metadata I took a random
sample of 20 songs from each year starting at 1970 The reason I forced the sampling
to 20 randomly sampled songs from each year and did not take a random sample of
songs from all years at once was to prevent bias towards any type of sounds As seen
in figure 22 there are significantly more songs from 2000-2011 than before 2000 The
mean year is x = 2001052 the median year is 2003 and the standard deviation of the
years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include
a disproportionate amount of more recent songs In order to not miss out on sounds
that may be more prevalent in older songs I required a set number of songs from each
year Next from each randomly selected song I selected 20 random timbre frames
in order to prevent any biases in data collection within each song In total there
were 422020 = 16800 timbre frames collected Next I clustered the timbre frames
using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to
100 and selecting the number of clusters with the lowest Bayes Information Criterion
(BIC) a statistical measure commonly used to calculate the best fitting model The
BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters
and saved the mean values of each of the 12 timbre segments for each cluster formed
In the same way that every song had the same 192 cord changes whose frequencies
could be compared between songs each song now had the same 46 timbre clusters
but different frequencies in each song When reading in the metadata from each song
I calculated the most likely timbre cluster each timbre frame belonged to and kept
25
Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear
a frequency count of all of the possible timbre clusters observed in a song Finally
as with the pitch data I divided all observed counts by the duration of the song in
order to normalize each songrsquos timbre counts
26
Chapter 3
Results
31 Methodology
After the pitch and timbre data was processed I ran the Dirichlet Process on the
data For each song I concatenated the 192-element chord change frequency list
and the 46-element timbre category frequency list giving each song a total of 238
features However there is a problem with this setup The pitch data will inherently
dominate the clustering process since it contains almost 3 times as many features
as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to
give different weights to each feature I considered another possibility to remedy
this discrepancy duplicating the timbre vector a certain number of times and
concatenating that to the feature set of each song While this strategy runs the risk
of corrupting the feature set and turning it into something that does not accurately
represent each song it is important to keep in mind that even without duplicating
the timbre vector the feature set consists of two separate feature sets concatenated
to each other Therefore timbre duplication appears to be a reasonable strategy to
weigh pitch and timbre more evenly
27
After this modification I tweaked a few more parameters before obtaining my
final results Dividing the pitch and timbre frequencies by the duration of the song
normalized every song to frequency per second but it also had the undesired effect
of making the data too small Timbre and pitch frequencies per second were almost
always less than 10 and many times hovered as low as 0002 for nonzero values
Because all of the values were very close to each other using common values of α
in the range of 01 to 1000-2000 was insufficient to push the songs into different
clusters As a result every song fell into the same cluster Increasing the value
of α by several orders of magnitude to well over 10 million fixed the problem but
this solution presented two problems First tuning α to experiment with different
ways to cluster the music would be problematic since I would have to work with
an enormous range of possible values for α Second pushing α to such high values
is not appropriate for the Dirichlet Process Extremely high values of α indicate a
Dirichlet Process that will try to disperse the data into different clusters but a value
of α that high is in principle always assigning each new song to a new cluster On
the other hand varying α between 01 and 1000 for example presents a much wider
range of flexibility when assigning clusters While this may be possible by varying
the values of α an extreme amount with the data as it currently is we are using
the Dirichlet Process in a way it should mathematically not be used Therefore
multiplying all of the data by a constant value so that we can work in the appropriate
range of α is the ideal approach After some experimentation I found that k=10 was
an appropriate scaling factor After initial runs of the Dirichlet Process I found out
that there was a slight issue with some of the earlier songs Since I had only artist
genre tags not specific song tags for each song I chose songs based on whether any
of the tags associated with the artist fell under any electronic music genre including
the generic term rsquoelectronicrsquo There were some bands mostly older ones from the
1960s and 1970s like Electric Light Orchestra which had some electronic music but
28
mostly featured rock funk disco or another genre Given that these artists featured
mostly non-electronic songs I decided to exclude them from my study and generate
a blacklist indicating these music artists While it was infeasible to look through
every single song and determine whether it was electronic or not I was able to look
over the earliest songs in each cluster These songs were the most important to verify
as electronic because early non-electronic songs could end up forming new clusters
and inadvertently create clusters with non-electronic sounds that I was not looking for
The goal of this thesis is to identify different groups in which EM songs are
clustered and identify the most unique artists and genres While the second task is
very simple because it requires looking at the earliest songs in each cluster the first
is difficult to gauge the effectiveness of While I can look at the average chord change
and timbre category frequencies in each category as well as other metadata putting
more semantic interpretations to what the music actually sounds like and determining
whether the music is clustered properly is a very subjective process For this reason
I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and
compared the clustering in each category examining similarities and differences in
the clusters formed in each scenario in the Discussion section For each value α I
set the upper limit of components or clusters allowed to 50 The ranges of α I used
resulted in 9 14 and 19 clusters formed
32 Findings
321 α=005
When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are
the distribution of years of the songs in each cluster (note that the Dirichlet Process
29
does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are
skipped)
30
Figure 31 Song year distributions for α = 005
For each value of α I also calculated the average frequency of each chord change
category and timbre category for each cluster and plotted the results The green
lines correspond to timbre and the blue lines to pitch
31
Figure 32 Timbre and pitch distributions for α = 005
A table of each cluster formed the number of songs in that cluster and descriptions
of pitch timbre and rhythmic qualities characteristic of songs in that cluster are
shown below
32
Cluster Song Count Characteristic Sounds
0 6481 Minimalist industrial space sounds dissonant chords
1 5482 Soft New Age ethereal
2 2405 Defined sounds electronic and non-electronic instru-
ments played in standard rock rhythms
3 360 Very dense and complex synths slightly darker tone
4 4550 Heavily distorted rock and synthesizer
6 2854 Faster paced 80s synth rock acid house
8 798 Aggressive beats dense house music
9 1464 Ambient house trancelike strong beats mysterious
tone
11 1597 Melancholy tones New wave rock in 80s then starting
in 90s downtempo trip-hop nu-metal
Table 31 Song cluster descriptions for α = 005
322 α=01
A total of 14 clusters were formed (16 were formed but 2 clusters contained only
one song each I listened to both of these songs and they did not sound unique so
I discarded them from the clusters) Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
33
34
Figure 33 Song year distributions for α = 01
35
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
bull segments_timbre a matrix of values indicating the distribution of MFCC-like
features (different types of tones) for each segments
The segments_pitches feature is a clear candidate for a differentiating factor for
songs since it reveals patterns of notes that occur Additionally other research
papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a
procedure that allows all songs to be compared with the same metric Likewise
timbre is intuitively a reliable differentiating feature since it reveals the amount
that different tones or sounds that sound different despite having the same pitch
Therefore segments_timbre is another feature that is considered in each song
Finally we look at the candidate features for rhythm At first glance all of these
features appear to be useful as they indicate the rhythm of a song in one way or
another However none of these features are as useful as the pitch and timbre
features While tempo is one factor in differentiating genres of EDM and music in
general tempo alone is not a driving force of musical innovation Certain genres
of EDM like drum nrsquo bass and happycore stand out for having very fast tempos
but the tempo is supplemented with a sound unique to the genre Conceiving new
arrangements of pitches combining instruments in new ways and inventing new
types of sounds are novel but speeding up or slowing down existing sounds is not
Including tempo as a feature could actually add noise to the model since many genres
overlap in their tempos And finally tempo is measured indirectly when the pitch
and timbre features are normalized for each song everything is measured in units of
ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features
each second Time signature can be dismissed from the candidates features for the
same reason as tempo many genres contain the same time signature and including
it in the feature set would only add more noise beats_start looks like a more
promising feature since like segments_pitches and segments_timbre it consists of
a vector of values However difficulties arise when we begin to think how exactly
19
we can utilize this information Since each song varies in length we need a way to
compare songs of different durations on the same level One approach could be to
perform basic statistics on the distance between each beat for example calculating
the mean and standard deviation of this distance However the normalized pitch
and timbre information already capture this data Another possibility is detecting
certain patterns of beats which could differentiate the syncopated dubstep or glitch
music beats from the steady pulse of electro-house But once again every beat is
accompanied by a sound with a specific timbre and pitch so this feature would not
add any significantly new information
23 Collecting Data and Preprocessing Selected Fea-
tures
231 Collecting the Data
Upon deciding the features I wanted to use in my research I first needed to collect
all of the electronic songs in the Million Song Dataset The easiest reliable way to
achieve this was to iterate through each song in the database and save the information
for the songs where any of the artist genre tags in artist_mbtags matched with an
electronic music genre While this measure was not fully accurate because it looks at
the genre of the artist not the song specific genre information for each song was not
as easily accessible so this indicator was nearly as good a substitute To generate a
list of the genres that electronic songs would fall under I manually searched through
a subset of the MSD to find all genres that seemed to be releated to electronic music
In the case of genres that were sometimes but not always electronic in nature such
20
as disco or pop I erred on the side of caution and did not include them in the list
of electronic genres In these cases false positives such as primarily rock songs that
happen to have the disco label attached to the artist could inadvertantly be included
in the dataset The final list of genres is as follows
target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo
rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo
rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo
rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
rsquodance and electronicarsquorsquoelectronicrsquo]
232 Pitch Preprocessing
A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-
cally informed manner The study first takes the raw sound data and converts it into
a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest
amount Then it computes the most likely chord by comparing comparing the 4 most
common types of chords in popular music (major minor dominant 7 and minor 7) to
the observed chord The most common chords are represented as ldquotemplate chordsrdquo
and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For
example using the note C as the first index the C major chord is represented as
CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)
For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is
computed over every template chord
ρCTc =12sumi=1
(CT minus CTi)(ci minus c)σCTσc
21
where CT is the mean of the values in the template chord σCT is the standard
deviation of the values in the chord and the operations on c are analogous Note
that the summation is over each individual pitch in the 12 pitch classes The chord
template with the highest value of ρ is selected as the chord for the time frame
After this is performed for each time frame the values are smoothed and then the
change between adjacent chords is observed The reasoning behind this step is that
by measuring the relative distance between chords rather than the chords themselves
all songs can be compared in the same manner even though they may have different
key signatures Finally the study takes the types of chord changes and classifies
them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted
versions of the chord changes that make more sense to a human such as ldquochanges
involving dominant 7th chordsrdquo
In my preliminary implementation of this method on an electronic dance music
corpus I made a few modifications to Mauchrsquos study First I smoothed out time
frames before computing the most probable chords rather than smoothing the most
probable chords I did this to save time and to reduce volatility in the chord
measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference
which contains 935 time frames and lasts 212 seconds 5 time frames is slightly
under 1 second and for preliminary testing appeared to be a good interval for each
time block Second as mentioned in the literature section I did not abstract the
chord changes into H-topics This decision also stemmed from time constraints since
deriving semantic chord meaning from EDM songs would require careful research
into the types of harmonies and sounds common in that genre of music Below I
included a high-level visualization of the pitch metadata found in a sample song
ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change
vector that I could then feed into the Dirichlet Process algorithm
22
Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813
CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513
DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913
FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713
GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213
AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513
13 13 13 13
060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013
13 13 13 13 13 13 13 13 13 13 13 13
13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13
Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13
Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13
Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13
Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13
23
13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13
13
13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13
Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13
24
A final step I took to normalize the chord change data was to divide the numbers by
the length of the song so that each songrsquos number of chord changes was measured per
second
233 Timbre Preprocessing
For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre
uniformly across all songs [8] After collecting all song metadata I took a random
sample of 20 songs from each year starting at 1970 The reason I forced the sampling
to 20 randomly sampled songs from each year and did not take a random sample of
songs from all years at once was to prevent bias towards any type of sounds As seen
in figure 22 there are significantly more songs from 2000-2011 than before 2000 The
mean year is x = 2001052 the median year is 2003 and the standard deviation of the
years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include
a disproportionate amount of more recent songs In order to not miss out on sounds
that may be more prevalent in older songs I required a set number of songs from each
year Next from each randomly selected song I selected 20 random timbre frames
in order to prevent any biases in data collection within each song In total there
were 422020 = 16800 timbre frames collected Next I clustered the timbre frames
using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to
100 and selecting the number of clusters with the lowest Bayes Information Criterion
(BIC) a statistical measure commonly used to calculate the best fitting model The
BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters
and saved the mean values of each of the 12 timbre segments for each cluster formed
In the same way that every song had the same 192 cord changes whose frequencies
could be compared between songs each song now had the same 46 timbre clusters
but different frequencies in each song When reading in the metadata from each song
I calculated the most likely timbre cluster each timbre frame belonged to and kept
25
Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear
a frequency count of all of the possible timbre clusters observed in a song Finally
as with the pitch data I divided all observed counts by the duration of the song in
order to normalize each songrsquos timbre counts
26
Chapter 3
Results
31 Methodology
After the pitch and timbre data was processed I ran the Dirichlet Process on the
data For each song I concatenated the 192-element chord change frequency list
and the 46-element timbre category frequency list giving each song a total of 238
features However there is a problem with this setup The pitch data will inherently
dominate the clustering process since it contains almost 3 times as many features
as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to
give different weights to each feature I considered another possibility to remedy
this discrepancy duplicating the timbre vector a certain number of times and
concatenating that to the feature set of each song While this strategy runs the risk
of corrupting the feature set and turning it into something that does not accurately
represent each song it is important to keep in mind that even without duplicating
the timbre vector the feature set consists of two separate feature sets concatenated
to each other Therefore timbre duplication appears to be a reasonable strategy to
weigh pitch and timbre more evenly
27
After this modification I tweaked a few more parameters before obtaining my
final results Dividing the pitch and timbre frequencies by the duration of the song
normalized every song to frequency per second but it also had the undesired effect
of making the data too small Timbre and pitch frequencies per second were almost
always less than 10 and many times hovered as low as 0002 for nonzero values
Because all of the values were very close to each other using common values of α
in the range of 01 to 1000-2000 was insufficient to push the songs into different
clusters As a result every song fell into the same cluster Increasing the value
of α by several orders of magnitude to well over 10 million fixed the problem but
this solution presented two problems First tuning α to experiment with different
ways to cluster the music would be problematic since I would have to work with
an enormous range of possible values for α Second pushing α to such high values
is not appropriate for the Dirichlet Process Extremely high values of α indicate a
Dirichlet Process that will try to disperse the data into different clusters but a value
of α that high is in principle always assigning each new song to a new cluster On
the other hand varying α between 01 and 1000 for example presents a much wider
range of flexibility when assigning clusters While this may be possible by varying
the values of α an extreme amount with the data as it currently is we are using
the Dirichlet Process in a way it should mathematically not be used Therefore
multiplying all of the data by a constant value so that we can work in the appropriate
range of α is the ideal approach After some experimentation I found that k=10 was
an appropriate scaling factor After initial runs of the Dirichlet Process I found out
that there was a slight issue with some of the earlier songs Since I had only artist
genre tags not specific song tags for each song I chose songs based on whether any
of the tags associated with the artist fell under any electronic music genre including
the generic term rsquoelectronicrsquo There were some bands mostly older ones from the
1960s and 1970s like Electric Light Orchestra which had some electronic music but
28
mostly featured rock funk disco or another genre Given that these artists featured
mostly non-electronic songs I decided to exclude them from my study and generate
a blacklist indicating these music artists While it was infeasible to look through
every single song and determine whether it was electronic or not I was able to look
over the earliest songs in each cluster These songs were the most important to verify
as electronic because early non-electronic songs could end up forming new clusters
and inadvertently create clusters with non-electronic sounds that I was not looking for
The goal of this thesis is to identify different groups in which EM songs are
clustered and identify the most unique artists and genres While the second task is
very simple because it requires looking at the earliest songs in each cluster the first
is difficult to gauge the effectiveness of While I can look at the average chord change
and timbre category frequencies in each category as well as other metadata putting
more semantic interpretations to what the music actually sounds like and determining
whether the music is clustered properly is a very subjective process For this reason
I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and
compared the clustering in each category examining similarities and differences in
the clusters formed in each scenario in the Discussion section For each value α I
set the upper limit of components or clusters allowed to 50 The ranges of α I used
resulted in 9 14 and 19 clusters formed
32 Findings
321 α=005
When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are
the distribution of years of the songs in each cluster (note that the Dirichlet Process
29
does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are
skipped)
30
Figure 31 Song year distributions for α = 005
For each value of α I also calculated the average frequency of each chord change
category and timbre category for each cluster and plotted the results The green
lines correspond to timbre and the blue lines to pitch
31
Figure 32 Timbre and pitch distributions for α = 005
A table of each cluster formed the number of songs in that cluster and descriptions
of pitch timbre and rhythmic qualities characteristic of songs in that cluster are
shown below
32
Cluster Song Count Characteristic Sounds
0 6481 Minimalist industrial space sounds dissonant chords
1 5482 Soft New Age ethereal
2 2405 Defined sounds electronic and non-electronic instru-
ments played in standard rock rhythms
3 360 Very dense and complex synths slightly darker tone
4 4550 Heavily distorted rock and synthesizer
6 2854 Faster paced 80s synth rock acid house
8 798 Aggressive beats dense house music
9 1464 Ambient house trancelike strong beats mysterious
tone
11 1597 Melancholy tones New wave rock in 80s then starting
in 90s downtempo trip-hop nu-metal
Table 31 Song cluster descriptions for α = 005
322 α=01
A total of 14 clusters were formed (16 were formed but 2 clusters contained only
one song each I listened to both of these songs and they did not sound unique so
I discarded them from the clusters) Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
33
34
Figure 33 Song year distributions for α = 01
35
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
we can utilize this information Since each song varies in length we need a way to
compare songs of different durations on the same level One approach could be to
perform basic statistics on the distance between each beat for example calculating
the mean and standard deviation of this distance However the normalized pitch
and timbre information already capture this data Another possibility is detecting
certain patterns of beats which could differentiate the syncopated dubstep or glitch
music beats from the steady pulse of electro-house But once again every beat is
accompanied by a sound with a specific timbre and pitch so this feature would not
add any significantly new information
23 Collecting Data and Preprocessing Selected Fea-
tures
231 Collecting the Data
Upon deciding the features I wanted to use in my research I first needed to collect
all of the electronic songs in the Million Song Dataset The easiest reliable way to
achieve this was to iterate through each song in the database and save the information
for the songs where any of the artist genre tags in artist_mbtags matched with an
electronic music genre While this measure was not fully accurate because it looks at
the genre of the artist not the song specific genre information for each song was not
as easily accessible so this indicator was nearly as good a substitute To generate a
list of the genres that electronic songs would fall under I manually searched through
a subset of the MSD to find all genres that seemed to be releated to electronic music
In the case of genres that were sometimes but not always electronic in nature such
20
as disco or pop I erred on the side of caution and did not include them in the list
of electronic genres In these cases false positives such as primarily rock songs that
happen to have the disco label attached to the artist could inadvertantly be included
in the dataset The final list of genres is as follows
target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo
rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo
rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo
rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
rsquodance and electronicarsquorsquoelectronicrsquo]
232 Pitch Preprocessing
A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-
cally informed manner The study first takes the raw sound data and converts it into
a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest
amount Then it computes the most likely chord by comparing comparing the 4 most
common types of chords in popular music (major minor dominant 7 and minor 7) to
the observed chord The most common chords are represented as ldquotemplate chordsrdquo
and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For
example using the note C as the first index the C major chord is represented as
CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)
For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is
computed over every template chord
ρCTc =12sumi=1
(CT minus CTi)(ci minus c)σCTσc
21
where CT is the mean of the values in the template chord σCT is the standard
deviation of the values in the chord and the operations on c are analogous Note
that the summation is over each individual pitch in the 12 pitch classes The chord
template with the highest value of ρ is selected as the chord for the time frame
After this is performed for each time frame the values are smoothed and then the
change between adjacent chords is observed The reasoning behind this step is that
by measuring the relative distance between chords rather than the chords themselves
all songs can be compared in the same manner even though they may have different
key signatures Finally the study takes the types of chord changes and classifies
them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted
versions of the chord changes that make more sense to a human such as ldquochanges
involving dominant 7th chordsrdquo
In my preliminary implementation of this method on an electronic dance music
corpus I made a few modifications to Mauchrsquos study First I smoothed out time
frames before computing the most probable chords rather than smoothing the most
probable chords I did this to save time and to reduce volatility in the chord
measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference
which contains 935 time frames and lasts 212 seconds 5 time frames is slightly
under 1 second and for preliminary testing appeared to be a good interval for each
time block Second as mentioned in the literature section I did not abstract the
chord changes into H-topics This decision also stemmed from time constraints since
deriving semantic chord meaning from EDM songs would require careful research
into the types of harmonies and sounds common in that genre of music Below I
included a high-level visualization of the pitch metadata found in a sample song
ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change
vector that I could then feed into the Dirichlet Process algorithm
22
Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813
CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513
DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913
FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713
GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213
AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513
13 13 13 13
060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013
13 13 13 13 13 13 13 13 13 13 13 13
13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13
Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13
Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13
Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13
Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13
23
13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13
13
13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13
Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13
24
A final step I took to normalize the chord change data was to divide the numbers by
the length of the song so that each songrsquos number of chord changes was measured per
second
233 Timbre Preprocessing
For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre
uniformly across all songs [8] After collecting all song metadata I took a random
sample of 20 songs from each year starting at 1970 The reason I forced the sampling
to 20 randomly sampled songs from each year and did not take a random sample of
songs from all years at once was to prevent bias towards any type of sounds As seen
in figure 22 there are significantly more songs from 2000-2011 than before 2000 The
mean year is x = 2001052 the median year is 2003 and the standard deviation of the
years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include
a disproportionate amount of more recent songs In order to not miss out on sounds
that may be more prevalent in older songs I required a set number of songs from each
year Next from each randomly selected song I selected 20 random timbre frames
in order to prevent any biases in data collection within each song In total there
were 422020 = 16800 timbre frames collected Next I clustered the timbre frames
using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to
100 and selecting the number of clusters with the lowest Bayes Information Criterion
(BIC) a statistical measure commonly used to calculate the best fitting model The
BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters
and saved the mean values of each of the 12 timbre segments for each cluster formed
In the same way that every song had the same 192 cord changes whose frequencies
could be compared between songs each song now had the same 46 timbre clusters
but different frequencies in each song When reading in the metadata from each song
I calculated the most likely timbre cluster each timbre frame belonged to and kept
25
Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear
a frequency count of all of the possible timbre clusters observed in a song Finally
as with the pitch data I divided all observed counts by the duration of the song in
order to normalize each songrsquos timbre counts
26
Chapter 3
Results
31 Methodology
After the pitch and timbre data was processed I ran the Dirichlet Process on the
data For each song I concatenated the 192-element chord change frequency list
and the 46-element timbre category frequency list giving each song a total of 238
features However there is a problem with this setup The pitch data will inherently
dominate the clustering process since it contains almost 3 times as many features
as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to
give different weights to each feature I considered another possibility to remedy
this discrepancy duplicating the timbre vector a certain number of times and
concatenating that to the feature set of each song While this strategy runs the risk
of corrupting the feature set and turning it into something that does not accurately
represent each song it is important to keep in mind that even without duplicating
the timbre vector the feature set consists of two separate feature sets concatenated
to each other Therefore timbre duplication appears to be a reasonable strategy to
weigh pitch and timbre more evenly
27
After this modification I tweaked a few more parameters before obtaining my
final results Dividing the pitch and timbre frequencies by the duration of the song
normalized every song to frequency per second but it also had the undesired effect
of making the data too small Timbre and pitch frequencies per second were almost
always less than 10 and many times hovered as low as 0002 for nonzero values
Because all of the values were very close to each other using common values of α
in the range of 01 to 1000-2000 was insufficient to push the songs into different
clusters As a result every song fell into the same cluster Increasing the value
of α by several orders of magnitude to well over 10 million fixed the problem but
this solution presented two problems First tuning α to experiment with different
ways to cluster the music would be problematic since I would have to work with
an enormous range of possible values for α Second pushing α to such high values
is not appropriate for the Dirichlet Process Extremely high values of α indicate a
Dirichlet Process that will try to disperse the data into different clusters but a value
of α that high is in principle always assigning each new song to a new cluster On
the other hand varying α between 01 and 1000 for example presents a much wider
range of flexibility when assigning clusters While this may be possible by varying
the values of α an extreme amount with the data as it currently is we are using
the Dirichlet Process in a way it should mathematically not be used Therefore
multiplying all of the data by a constant value so that we can work in the appropriate
range of α is the ideal approach After some experimentation I found that k=10 was
an appropriate scaling factor After initial runs of the Dirichlet Process I found out
that there was a slight issue with some of the earlier songs Since I had only artist
genre tags not specific song tags for each song I chose songs based on whether any
of the tags associated with the artist fell under any electronic music genre including
the generic term rsquoelectronicrsquo There were some bands mostly older ones from the
1960s and 1970s like Electric Light Orchestra which had some electronic music but
28
mostly featured rock funk disco or another genre Given that these artists featured
mostly non-electronic songs I decided to exclude them from my study and generate
a blacklist indicating these music artists While it was infeasible to look through
every single song and determine whether it was electronic or not I was able to look
over the earliest songs in each cluster These songs were the most important to verify
as electronic because early non-electronic songs could end up forming new clusters
and inadvertently create clusters with non-electronic sounds that I was not looking for
The goal of this thesis is to identify different groups in which EM songs are
clustered and identify the most unique artists and genres While the second task is
very simple because it requires looking at the earliest songs in each cluster the first
is difficult to gauge the effectiveness of While I can look at the average chord change
and timbre category frequencies in each category as well as other metadata putting
more semantic interpretations to what the music actually sounds like and determining
whether the music is clustered properly is a very subjective process For this reason
I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and
compared the clustering in each category examining similarities and differences in
the clusters formed in each scenario in the Discussion section For each value α I
set the upper limit of components or clusters allowed to 50 The ranges of α I used
resulted in 9 14 and 19 clusters formed
32 Findings
321 α=005
When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are
the distribution of years of the songs in each cluster (note that the Dirichlet Process
29
does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are
skipped)
30
Figure 31 Song year distributions for α = 005
For each value of α I also calculated the average frequency of each chord change
category and timbre category for each cluster and plotted the results The green
lines correspond to timbre and the blue lines to pitch
31
Figure 32 Timbre and pitch distributions for α = 005
A table of each cluster formed the number of songs in that cluster and descriptions
of pitch timbre and rhythmic qualities characteristic of songs in that cluster are
shown below
32
Cluster Song Count Characteristic Sounds
0 6481 Minimalist industrial space sounds dissonant chords
1 5482 Soft New Age ethereal
2 2405 Defined sounds electronic and non-electronic instru-
ments played in standard rock rhythms
3 360 Very dense and complex synths slightly darker tone
4 4550 Heavily distorted rock and synthesizer
6 2854 Faster paced 80s synth rock acid house
8 798 Aggressive beats dense house music
9 1464 Ambient house trancelike strong beats mysterious
tone
11 1597 Melancholy tones New wave rock in 80s then starting
in 90s downtempo trip-hop nu-metal
Table 31 Song cluster descriptions for α = 005
322 α=01
A total of 14 clusters were formed (16 were formed but 2 clusters contained only
one song each I listened to both of these songs and they did not sound unique so
I discarded them from the clusters) Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
33
34
Figure 33 Song year distributions for α = 01
35
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
as disco or pop I erred on the side of caution and did not include them in the list
of electronic genres In these cases false positives such as primarily rock songs that
happen to have the disco label attached to the artist could inadvertantly be included
in the dataset The final list of genres is as follows
target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo
rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo
rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo
rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
rsquodance and electronicarsquorsquoelectronicrsquo]
232 Pitch Preprocessing
A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-
cally informed manner The study first takes the raw sound data and converts it into
a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest
amount Then it computes the most likely chord by comparing comparing the 4 most
common types of chords in popular music (major minor dominant 7 and minor 7) to
the observed chord The most common chords are represented as ldquotemplate chordsrdquo
and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For
example using the note C as the first index the C major chord is represented as
CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)
For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is
computed over every template chord
ρCTc =12sumi=1
(CT minus CTi)(ci minus c)σCTσc
21
where CT is the mean of the values in the template chord σCT is the standard
deviation of the values in the chord and the operations on c are analogous Note
that the summation is over each individual pitch in the 12 pitch classes The chord
template with the highest value of ρ is selected as the chord for the time frame
After this is performed for each time frame the values are smoothed and then the
change between adjacent chords is observed The reasoning behind this step is that
by measuring the relative distance between chords rather than the chords themselves
all songs can be compared in the same manner even though they may have different
key signatures Finally the study takes the types of chord changes and classifies
them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted
versions of the chord changes that make more sense to a human such as ldquochanges
involving dominant 7th chordsrdquo
In my preliminary implementation of this method on an electronic dance music
corpus I made a few modifications to Mauchrsquos study First I smoothed out time
frames before computing the most probable chords rather than smoothing the most
probable chords I did this to save time and to reduce volatility in the chord
measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference
which contains 935 time frames and lasts 212 seconds 5 time frames is slightly
under 1 second and for preliminary testing appeared to be a good interval for each
time block Second as mentioned in the literature section I did not abstract the
chord changes into H-topics This decision also stemmed from time constraints since
deriving semantic chord meaning from EDM songs would require careful research
into the types of harmonies and sounds common in that genre of music Below I
included a high-level visualization of the pitch metadata found in a sample song
ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change
vector that I could then feed into the Dirichlet Process algorithm
22
Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813
CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513
DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913
FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713
GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213
AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513
13 13 13 13
060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013
13 13 13 13 13 13 13 13 13 13 13 13
13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13
Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13
Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13
Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13
Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13
23
13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13
13
13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13
Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13
24
A final step I took to normalize the chord change data was to divide the numbers by
the length of the song so that each songrsquos number of chord changes was measured per
second
233 Timbre Preprocessing
For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre
uniformly across all songs [8] After collecting all song metadata I took a random
sample of 20 songs from each year starting at 1970 The reason I forced the sampling
to 20 randomly sampled songs from each year and did not take a random sample of
songs from all years at once was to prevent bias towards any type of sounds As seen
in figure 22 there are significantly more songs from 2000-2011 than before 2000 The
mean year is x = 2001052 the median year is 2003 and the standard deviation of the
years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include
a disproportionate amount of more recent songs In order to not miss out on sounds
that may be more prevalent in older songs I required a set number of songs from each
year Next from each randomly selected song I selected 20 random timbre frames
in order to prevent any biases in data collection within each song In total there
were 422020 = 16800 timbre frames collected Next I clustered the timbre frames
using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to
100 and selecting the number of clusters with the lowest Bayes Information Criterion
(BIC) a statistical measure commonly used to calculate the best fitting model The
BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters
and saved the mean values of each of the 12 timbre segments for each cluster formed
In the same way that every song had the same 192 cord changes whose frequencies
could be compared between songs each song now had the same 46 timbre clusters
but different frequencies in each song When reading in the metadata from each song
I calculated the most likely timbre cluster each timbre frame belonged to and kept
25
Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear
a frequency count of all of the possible timbre clusters observed in a song Finally
as with the pitch data I divided all observed counts by the duration of the song in
order to normalize each songrsquos timbre counts
26
Chapter 3
Results
31 Methodology
After the pitch and timbre data was processed I ran the Dirichlet Process on the
data For each song I concatenated the 192-element chord change frequency list
and the 46-element timbre category frequency list giving each song a total of 238
features However there is a problem with this setup The pitch data will inherently
dominate the clustering process since it contains almost 3 times as many features
as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to
give different weights to each feature I considered another possibility to remedy
this discrepancy duplicating the timbre vector a certain number of times and
concatenating that to the feature set of each song While this strategy runs the risk
of corrupting the feature set and turning it into something that does not accurately
represent each song it is important to keep in mind that even without duplicating
the timbre vector the feature set consists of two separate feature sets concatenated
to each other Therefore timbre duplication appears to be a reasonable strategy to
weigh pitch and timbre more evenly
27
After this modification I tweaked a few more parameters before obtaining my
final results Dividing the pitch and timbre frequencies by the duration of the song
normalized every song to frequency per second but it also had the undesired effect
of making the data too small Timbre and pitch frequencies per second were almost
always less than 10 and many times hovered as low as 0002 for nonzero values
Because all of the values were very close to each other using common values of α
in the range of 01 to 1000-2000 was insufficient to push the songs into different
clusters As a result every song fell into the same cluster Increasing the value
of α by several orders of magnitude to well over 10 million fixed the problem but
this solution presented two problems First tuning α to experiment with different
ways to cluster the music would be problematic since I would have to work with
an enormous range of possible values for α Second pushing α to such high values
is not appropriate for the Dirichlet Process Extremely high values of α indicate a
Dirichlet Process that will try to disperse the data into different clusters but a value
of α that high is in principle always assigning each new song to a new cluster On
the other hand varying α between 01 and 1000 for example presents a much wider
range of flexibility when assigning clusters While this may be possible by varying
the values of α an extreme amount with the data as it currently is we are using
the Dirichlet Process in a way it should mathematically not be used Therefore
multiplying all of the data by a constant value so that we can work in the appropriate
range of α is the ideal approach After some experimentation I found that k=10 was
an appropriate scaling factor After initial runs of the Dirichlet Process I found out
that there was a slight issue with some of the earlier songs Since I had only artist
genre tags not specific song tags for each song I chose songs based on whether any
of the tags associated with the artist fell under any electronic music genre including
the generic term rsquoelectronicrsquo There were some bands mostly older ones from the
1960s and 1970s like Electric Light Orchestra which had some electronic music but
28
mostly featured rock funk disco or another genre Given that these artists featured
mostly non-electronic songs I decided to exclude them from my study and generate
a blacklist indicating these music artists While it was infeasible to look through
every single song and determine whether it was electronic or not I was able to look
over the earliest songs in each cluster These songs were the most important to verify
as electronic because early non-electronic songs could end up forming new clusters
and inadvertently create clusters with non-electronic sounds that I was not looking for
The goal of this thesis is to identify different groups in which EM songs are
clustered and identify the most unique artists and genres While the second task is
very simple because it requires looking at the earliest songs in each cluster the first
is difficult to gauge the effectiveness of While I can look at the average chord change
and timbre category frequencies in each category as well as other metadata putting
more semantic interpretations to what the music actually sounds like and determining
whether the music is clustered properly is a very subjective process For this reason
I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and
compared the clustering in each category examining similarities and differences in
the clusters formed in each scenario in the Discussion section For each value α I
set the upper limit of components or clusters allowed to 50 The ranges of α I used
resulted in 9 14 and 19 clusters formed
32 Findings
321 α=005
When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are
the distribution of years of the songs in each cluster (note that the Dirichlet Process
29
does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are
skipped)
30
Figure 31 Song year distributions for α = 005
For each value of α I also calculated the average frequency of each chord change
category and timbre category for each cluster and plotted the results The green
lines correspond to timbre and the blue lines to pitch
31
Figure 32 Timbre and pitch distributions for α = 005
A table of each cluster formed the number of songs in that cluster and descriptions
of pitch timbre and rhythmic qualities characteristic of songs in that cluster are
shown below
32
Cluster Song Count Characteristic Sounds
0 6481 Minimalist industrial space sounds dissonant chords
1 5482 Soft New Age ethereal
2 2405 Defined sounds electronic and non-electronic instru-
ments played in standard rock rhythms
3 360 Very dense and complex synths slightly darker tone
4 4550 Heavily distorted rock and synthesizer
6 2854 Faster paced 80s synth rock acid house
8 798 Aggressive beats dense house music
9 1464 Ambient house trancelike strong beats mysterious
tone
11 1597 Melancholy tones New wave rock in 80s then starting
in 90s downtempo trip-hop nu-metal
Table 31 Song cluster descriptions for α = 005
322 α=01
A total of 14 clusters were formed (16 were formed but 2 clusters contained only
one song each I listened to both of these songs and they did not sound unique so
I discarded them from the clusters) Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
33
34
Figure 33 Song year distributions for α = 01
35
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
where CT is the mean of the values in the template chord σCT is the standard
deviation of the values in the chord and the operations on c are analogous Note
that the summation is over each individual pitch in the 12 pitch classes The chord
template with the highest value of ρ is selected as the chord for the time frame
After this is performed for each time frame the values are smoothed and then the
change between adjacent chords is observed The reasoning behind this step is that
by measuring the relative distance between chords rather than the chords themselves
all songs can be compared in the same manner even though they may have different
key signatures Finally the study takes the types of chord changes and classifies
them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted
versions of the chord changes that make more sense to a human such as ldquochanges
involving dominant 7th chordsrdquo
In my preliminary implementation of this method on an electronic dance music
corpus I made a few modifications to Mauchrsquos study First I smoothed out time
frames before computing the most probable chords rather than smoothing the most
probable chords I did this to save time and to reduce volatility in the chord
measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference
which contains 935 time frames and lasts 212 seconds 5 time frames is slightly
under 1 second and for preliminary testing appeared to be a good interval for each
time block Second as mentioned in the literature section I did not abstract the
chord changes into H-topics This decision also stemmed from time constraints since
deriving semantic chord meaning from EDM songs would require careful research
into the types of harmonies and sounds common in that genre of music Below I
included a high-level visualization of the pitch metadata found in a sample song
ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change
vector that I could then feed into the Dirichlet Process algorithm
22
Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813
CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513
DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913
FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713
GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213
AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513
13 13 13 13
060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013
13 13 13 13 13 13 13 13 13 13 13 13
13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13
Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13
Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13
Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13
Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13
23
13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13
13
13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13
Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13
24
A final step I took to normalize the chord change data was to divide the numbers by
the length of the song so that each songrsquos number of chord changes was measured per
second
233 Timbre Preprocessing
For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre
uniformly across all songs [8] After collecting all song metadata I took a random
sample of 20 songs from each year starting at 1970 The reason I forced the sampling
to 20 randomly sampled songs from each year and did not take a random sample of
songs from all years at once was to prevent bias towards any type of sounds As seen
in figure 22 there are significantly more songs from 2000-2011 than before 2000 The
mean year is x = 2001052 the median year is 2003 and the standard deviation of the
years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include
a disproportionate amount of more recent songs In order to not miss out on sounds
that may be more prevalent in older songs I required a set number of songs from each
year Next from each randomly selected song I selected 20 random timbre frames
in order to prevent any biases in data collection within each song In total there
were 422020 = 16800 timbre frames collected Next I clustered the timbre frames
using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to
100 and selecting the number of clusters with the lowest Bayes Information Criterion
(BIC) a statistical measure commonly used to calculate the best fitting model The
BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters
and saved the mean values of each of the 12 timbre segments for each cluster formed
In the same way that every song had the same 192 cord changes whose frequencies
could be compared between songs each song now had the same 46 timbre clusters
but different frequencies in each song When reading in the metadata from each song
I calculated the most likely timbre cluster each timbre frame belonged to and kept
25
Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear
a frequency count of all of the possible timbre clusters observed in a song Finally
as with the pitch data I divided all observed counts by the duration of the song in
order to normalize each songrsquos timbre counts
26
Chapter 3
Results
31 Methodology
After the pitch and timbre data was processed I ran the Dirichlet Process on the
data For each song I concatenated the 192-element chord change frequency list
and the 46-element timbre category frequency list giving each song a total of 238
features However there is a problem with this setup The pitch data will inherently
dominate the clustering process since it contains almost 3 times as many features
as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to
give different weights to each feature I considered another possibility to remedy
this discrepancy duplicating the timbre vector a certain number of times and
concatenating that to the feature set of each song While this strategy runs the risk
of corrupting the feature set and turning it into something that does not accurately
represent each song it is important to keep in mind that even without duplicating
the timbre vector the feature set consists of two separate feature sets concatenated
to each other Therefore timbre duplication appears to be a reasonable strategy to
weigh pitch and timbre more evenly
27
After this modification I tweaked a few more parameters before obtaining my
final results Dividing the pitch and timbre frequencies by the duration of the song
normalized every song to frequency per second but it also had the undesired effect
of making the data too small Timbre and pitch frequencies per second were almost
always less than 10 and many times hovered as low as 0002 for nonzero values
Because all of the values were very close to each other using common values of α
in the range of 01 to 1000-2000 was insufficient to push the songs into different
clusters As a result every song fell into the same cluster Increasing the value
of α by several orders of magnitude to well over 10 million fixed the problem but
this solution presented two problems First tuning α to experiment with different
ways to cluster the music would be problematic since I would have to work with
an enormous range of possible values for α Second pushing α to such high values
is not appropriate for the Dirichlet Process Extremely high values of α indicate a
Dirichlet Process that will try to disperse the data into different clusters but a value
of α that high is in principle always assigning each new song to a new cluster On
the other hand varying α between 01 and 1000 for example presents a much wider
range of flexibility when assigning clusters While this may be possible by varying
the values of α an extreme amount with the data as it currently is we are using
the Dirichlet Process in a way it should mathematically not be used Therefore
multiplying all of the data by a constant value so that we can work in the appropriate
range of α is the ideal approach After some experimentation I found that k=10 was
an appropriate scaling factor After initial runs of the Dirichlet Process I found out
that there was a slight issue with some of the earlier songs Since I had only artist
genre tags not specific song tags for each song I chose songs based on whether any
of the tags associated with the artist fell under any electronic music genre including
the generic term rsquoelectronicrsquo There were some bands mostly older ones from the
1960s and 1970s like Electric Light Orchestra which had some electronic music but
28
mostly featured rock funk disco or another genre Given that these artists featured
mostly non-electronic songs I decided to exclude them from my study and generate
a blacklist indicating these music artists While it was infeasible to look through
every single song and determine whether it was electronic or not I was able to look
over the earliest songs in each cluster These songs were the most important to verify
as electronic because early non-electronic songs could end up forming new clusters
and inadvertently create clusters with non-electronic sounds that I was not looking for
The goal of this thesis is to identify different groups in which EM songs are
clustered and identify the most unique artists and genres While the second task is
very simple because it requires looking at the earliest songs in each cluster the first
is difficult to gauge the effectiveness of While I can look at the average chord change
and timbre category frequencies in each category as well as other metadata putting
more semantic interpretations to what the music actually sounds like and determining
whether the music is clustered properly is a very subjective process For this reason
I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and
compared the clustering in each category examining similarities and differences in
the clusters formed in each scenario in the Discussion section For each value α I
set the upper limit of components or clusters allowed to 50 The ranges of α I used
resulted in 9 14 and 19 clusters formed
32 Findings
321 α=005
When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are
the distribution of years of the songs in each cluster (note that the Dirichlet Process
29
does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are
skipped)
30
Figure 31 Song year distributions for α = 005
For each value of α I also calculated the average frequency of each chord change
category and timbre category for each cluster and plotted the results The green
lines correspond to timbre and the blue lines to pitch
31
Figure 32 Timbre and pitch distributions for α = 005
A table of each cluster formed the number of songs in that cluster and descriptions
of pitch timbre and rhythmic qualities characteristic of songs in that cluster are
shown below
32
Cluster Song Count Characteristic Sounds
0 6481 Minimalist industrial space sounds dissonant chords
1 5482 Soft New Age ethereal
2 2405 Defined sounds electronic and non-electronic instru-
ments played in standard rock rhythms
3 360 Very dense and complex synths slightly darker tone
4 4550 Heavily distorted rock and synthesizer
6 2854 Faster paced 80s synth rock acid house
8 798 Aggressive beats dense house music
9 1464 Ambient house trancelike strong beats mysterious
tone
11 1597 Melancholy tones New wave rock in 80s then starting
in 90s downtempo trip-hop nu-metal
Table 31 Song cluster descriptions for α = 005
322 α=01
A total of 14 clusters were formed (16 were formed but 2 clusters contained only
one song each I listened to both of these songs and they did not sound unique so
I discarded them from the clusters) Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
33
34
Figure 33 Song year distributions for α = 01
35
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813
CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513
DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913
FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713
GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213
AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513
13 13 13 13
060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013
13 13 13 13 13 13 13 13 13 13 13 13
13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13
Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13
Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13
Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13
Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13
23
13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13
13
13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13
Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13
24
A final step I took to normalize the chord change data was to divide the numbers by
the length of the song so that each songrsquos number of chord changes was measured per
second
233 Timbre Preprocessing
For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre
uniformly across all songs [8] After collecting all song metadata I took a random
sample of 20 songs from each year starting at 1970 The reason I forced the sampling
to 20 randomly sampled songs from each year and did not take a random sample of
songs from all years at once was to prevent bias towards any type of sounds As seen
in figure 22 there are significantly more songs from 2000-2011 than before 2000 The
mean year is x = 2001052 the median year is 2003 and the standard deviation of the
years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include
a disproportionate amount of more recent songs In order to not miss out on sounds
that may be more prevalent in older songs I required a set number of songs from each
year Next from each randomly selected song I selected 20 random timbre frames
in order to prevent any biases in data collection within each song In total there
were 422020 = 16800 timbre frames collected Next I clustered the timbre frames
using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to
100 and selecting the number of clusters with the lowest Bayes Information Criterion
(BIC) a statistical measure commonly used to calculate the best fitting model The
BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters
and saved the mean values of each of the 12 timbre segments for each cluster formed
In the same way that every song had the same 192 cord changes whose frequencies
could be compared between songs each song now had the same 46 timbre clusters
but different frequencies in each song When reading in the metadata from each song
I calculated the most likely timbre cluster each timbre frame belonged to and kept
25
Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear
a frequency count of all of the possible timbre clusters observed in a song Finally
as with the pitch data I divided all observed counts by the duration of the song in
order to normalize each songrsquos timbre counts
26
Chapter 3
Results
31 Methodology
After the pitch and timbre data was processed I ran the Dirichlet Process on the
data For each song I concatenated the 192-element chord change frequency list
and the 46-element timbre category frequency list giving each song a total of 238
features However there is a problem with this setup The pitch data will inherently
dominate the clustering process since it contains almost 3 times as many features
as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to
give different weights to each feature I considered another possibility to remedy
this discrepancy duplicating the timbre vector a certain number of times and
concatenating that to the feature set of each song While this strategy runs the risk
of corrupting the feature set and turning it into something that does not accurately
represent each song it is important to keep in mind that even without duplicating
the timbre vector the feature set consists of two separate feature sets concatenated
to each other Therefore timbre duplication appears to be a reasonable strategy to
weigh pitch and timbre more evenly
27
After this modification I tweaked a few more parameters before obtaining my
final results Dividing the pitch and timbre frequencies by the duration of the song
normalized every song to frequency per second but it also had the undesired effect
of making the data too small Timbre and pitch frequencies per second were almost
always less than 10 and many times hovered as low as 0002 for nonzero values
Because all of the values were very close to each other using common values of α
in the range of 01 to 1000-2000 was insufficient to push the songs into different
clusters As a result every song fell into the same cluster Increasing the value
of α by several orders of magnitude to well over 10 million fixed the problem but
this solution presented two problems First tuning α to experiment with different
ways to cluster the music would be problematic since I would have to work with
an enormous range of possible values for α Second pushing α to such high values
is not appropriate for the Dirichlet Process Extremely high values of α indicate a
Dirichlet Process that will try to disperse the data into different clusters but a value
of α that high is in principle always assigning each new song to a new cluster On
the other hand varying α between 01 and 1000 for example presents a much wider
range of flexibility when assigning clusters While this may be possible by varying
the values of α an extreme amount with the data as it currently is we are using
the Dirichlet Process in a way it should mathematically not be used Therefore
multiplying all of the data by a constant value so that we can work in the appropriate
range of α is the ideal approach After some experimentation I found that k=10 was
an appropriate scaling factor After initial runs of the Dirichlet Process I found out
that there was a slight issue with some of the earlier songs Since I had only artist
genre tags not specific song tags for each song I chose songs based on whether any
of the tags associated with the artist fell under any electronic music genre including
the generic term rsquoelectronicrsquo There were some bands mostly older ones from the
1960s and 1970s like Electric Light Orchestra which had some electronic music but
28
mostly featured rock funk disco or another genre Given that these artists featured
mostly non-electronic songs I decided to exclude them from my study and generate
a blacklist indicating these music artists While it was infeasible to look through
every single song and determine whether it was electronic or not I was able to look
over the earliest songs in each cluster These songs were the most important to verify
as electronic because early non-electronic songs could end up forming new clusters
and inadvertently create clusters with non-electronic sounds that I was not looking for
The goal of this thesis is to identify different groups in which EM songs are
clustered and identify the most unique artists and genres While the second task is
very simple because it requires looking at the earliest songs in each cluster the first
is difficult to gauge the effectiveness of While I can look at the average chord change
and timbre category frequencies in each category as well as other metadata putting
more semantic interpretations to what the music actually sounds like and determining
whether the music is clustered properly is a very subjective process For this reason
I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and
compared the clustering in each category examining similarities and differences in
the clusters formed in each scenario in the Discussion section For each value α I
set the upper limit of components or clusters allowed to 50 The ranges of α I used
resulted in 9 14 and 19 clusters formed
32 Findings
321 α=005
When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are
the distribution of years of the songs in each cluster (note that the Dirichlet Process
29
does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are
skipped)
30
Figure 31 Song year distributions for α = 005
For each value of α I also calculated the average frequency of each chord change
category and timbre category for each cluster and plotted the results The green
lines correspond to timbre and the blue lines to pitch
31
Figure 32 Timbre and pitch distributions for α = 005
A table of each cluster formed the number of songs in that cluster and descriptions
of pitch timbre and rhythmic qualities characteristic of songs in that cluster are
shown below
32
Cluster Song Count Characteristic Sounds
0 6481 Minimalist industrial space sounds dissonant chords
1 5482 Soft New Age ethereal
2 2405 Defined sounds electronic and non-electronic instru-
ments played in standard rock rhythms
3 360 Very dense and complex synths slightly darker tone
4 4550 Heavily distorted rock and synthesizer
6 2854 Faster paced 80s synth rock acid house
8 798 Aggressive beats dense house music
9 1464 Ambient house trancelike strong beats mysterious
tone
11 1597 Melancholy tones New wave rock in 80s then starting
in 90s downtempo trip-hop nu-metal
Table 31 Song cluster descriptions for α = 005
322 α=01
A total of 14 clusters were formed (16 were formed but 2 clusters contained only
one song each I listened to both of these songs and they did not sound unique so
I discarded them from the clusters) Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
33
34
Figure 33 Song year distributions for α = 01
35
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13
13
13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13
Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13
24
A final step I took to normalize the chord change data was to divide the numbers by
the length of the song so that each songrsquos number of chord changes was measured per
second
233 Timbre Preprocessing
For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre
uniformly across all songs [8] After collecting all song metadata I took a random
sample of 20 songs from each year starting at 1970 The reason I forced the sampling
to 20 randomly sampled songs from each year and did not take a random sample of
songs from all years at once was to prevent bias towards any type of sounds As seen
in figure 22 there are significantly more songs from 2000-2011 than before 2000 The
mean year is x = 2001052 the median year is 2003 and the standard deviation of the
years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include
a disproportionate amount of more recent songs In order to not miss out on sounds
that may be more prevalent in older songs I required a set number of songs from each
year Next from each randomly selected song I selected 20 random timbre frames
in order to prevent any biases in data collection within each song In total there
were 422020 = 16800 timbre frames collected Next I clustered the timbre frames
using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to
100 and selecting the number of clusters with the lowest Bayes Information Criterion
(BIC) a statistical measure commonly used to calculate the best fitting model The
BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters
and saved the mean values of each of the 12 timbre segments for each cluster formed
In the same way that every song had the same 192 cord changes whose frequencies
could be compared between songs each song now had the same 46 timbre clusters
but different frequencies in each song When reading in the metadata from each song
I calculated the most likely timbre cluster each timbre frame belonged to and kept
25
Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear
a frequency count of all of the possible timbre clusters observed in a song Finally
as with the pitch data I divided all observed counts by the duration of the song in
order to normalize each songrsquos timbre counts
26
Chapter 3
Results
31 Methodology
After the pitch and timbre data was processed I ran the Dirichlet Process on the
data For each song I concatenated the 192-element chord change frequency list
and the 46-element timbre category frequency list giving each song a total of 238
features However there is a problem with this setup The pitch data will inherently
dominate the clustering process since it contains almost 3 times as many features
as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to
give different weights to each feature I considered another possibility to remedy
this discrepancy duplicating the timbre vector a certain number of times and
concatenating that to the feature set of each song While this strategy runs the risk
of corrupting the feature set and turning it into something that does not accurately
represent each song it is important to keep in mind that even without duplicating
the timbre vector the feature set consists of two separate feature sets concatenated
to each other Therefore timbre duplication appears to be a reasonable strategy to
weigh pitch and timbre more evenly
27
After this modification I tweaked a few more parameters before obtaining my
final results Dividing the pitch and timbre frequencies by the duration of the song
normalized every song to frequency per second but it also had the undesired effect
of making the data too small Timbre and pitch frequencies per second were almost
always less than 10 and many times hovered as low as 0002 for nonzero values
Because all of the values were very close to each other using common values of α
in the range of 01 to 1000-2000 was insufficient to push the songs into different
clusters As a result every song fell into the same cluster Increasing the value
of α by several orders of magnitude to well over 10 million fixed the problem but
this solution presented two problems First tuning α to experiment with different
ways to cluster the music would be problematic since I would have to work with
an enormous range of possible values for α Second pushing α to such high values
is not appropriate for the Dirichlet Process Extremely high values of α indicate a
Dirichlet Process that will try to disperse the data into different clusters but a value
of α that high is in principle always assigning each new song to a new cluster On
the other hand varying α between 01 and 1000 for example presents a much wider
range of flexibility when assigning clusters While this may be possible by varying
the values of α an extreme amount with the data as it currently is we are using
the Dirichlet Process in a way it should mathematically not be used Therefore
multiplying all of the data by a constant value so that we can work in the appropriate
range of α is the ideal approach After some experimentation I found that k=10 was
an appropriate scaling factor After initial runs of the Dirichlet Process I found out
that there was a slight issue with some of the earlier songs Since I had only artist
genre tags not specific song tags for each song I chose songs based on whether any
of the tags associated with the artist fell under any electronic music genre including
the generic term rsquoelectronicrsquo There were some bands mostly older ones from the
1960s and 1970s like Electric Light Orchestra which had some electronic music but
28
mostly featured rock funk disco or another genre Given that these artists featured
mostly non-electronic songs I decided to exclude them from my study and generate
a blacklist indicating these music artists While it was infeasible to look through
every single song and determine whether it was electronic or not I was able to look
over the earliest songs in each cluster These songs were the most important to verify
as electronic because early non-electronic songs could end up forming new clusters
and inadvertently create clusters with non-electronic sounds that I was not looking for
The goal of this thesis is to identify different groups in which EM songs are
clustered and identify the most unique artists and genres While the second task is
very simple because it requires looking at the earliest songs in each cluster the first
is difficult to gauge the effectiveness of While I can look at the average chord change
and timbre category frequencies in each category as well as other metadata putting
more semantic interpretations to what the music actually sounds like and determining
whether the music is clustered properly is a very subjective process For this reason
I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and
compared the clustering in each category examining similarities and differences in
the clusters formed in each scenario in the Discussion section For each value α I
set the upper limit of components or clusters allowed to 50 The ranges of α I used
resulted in 9 14 and 19 clusters formed
32 Findings
321 α=005
When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are
the distribution of years of the songs in each cluster (note that the Dirichlet Process
29
does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are
skipped)
30
Figure 31 Song year distributions for α = 005
For each value of α I also calculated the average frequency of each chord change
category and timbre category for each cluster and plotted the results The green
lines correspond to timbre and the blue lines to pitch
31
Figure 32 Timbre and pitch distributions for α = 005
A table of each cluster formed the number of songs in that cluster and descriptions
of pitch timbre and rhythmic qualities characteristic of songs in that cluster are
shown below
32
Cluster Song Count Characteristic Sounds
0 6481 Minimalist industrial space sounds dissonant chords
1 5482 Soft New Age ethereal
2 2405 Defined sounds electronic and non-electronic instru-
ments played in standard rock rhythms
3 360 Very dense and complex synths slightly darker tone
4 4550 Heavily distorted rock and synthesizer
6 2854 Faster paced 80s synth rock acid house
8 798 Aggressive beats dense house music
9 1464 Ambient house trancelike strong beats mysterious
tone
11 1597 Melancholy tones New wave rock in 80s then starting
in 90s downtempo trip-hop nu-metal
Table 31 Song cluster descriptions for α = 005
322 α=01
A total of 14 clusters were formed (16 were formed but 2 clusters contained only
one song each I listened to both of these songs and they did not sound unique so
I discarded them from the clusters) Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
33
34
Figure 33 Song year distributions for α = 01
35
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
A final step I took to normalize the chord change data was to divide the numbers by
the length of the song so that each songrsquos number of chord changes was measured per
second
233 Timbre Preprocessing
For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre
uniformly across all songs [8] After collecting all song metadata I took a random
sample of 20 songs from each year starting at 1970 The reason I forced the sampling
to 20 randomly sampled songs from each year and did not take a random sample of
songs from all years at once was to prevent bias towards any type of sounds As seen
in figure 22 there are significantly more songs from 2000-2011 than before 2000 The
mean year is x = 2001052 the median year is 2003 and the standard deviation of the
years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include
a disproportionate amount of more recent songs In order to not miss out on sounds
that may be more prevalent in older songs I required a set number of songs from each
year Next from each randomly selected song I selected 20 random timbre frames
in order to prevent any biases in data collection within each song In total there
were 422020 = 16800 timbre frames collected Next I clustered the timbre frames
using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to
100 and selecting the number of clusters with the lowest Bayes Information Criterion
(BIC) a statistical measure commonly used to calculate the best fitting model The
BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters
and saved the mean values of each of the 12 timbre segments for each cluster formed
In the same way that every song had the same 192 cord changes whose frequencies
could be compared between songs each song now had the same 46 timbre clusters
but different frequencies in each song When reading in the metadata from each song
I calculated the most likely timbre cluster each timbre frame belonged to and kept
25
Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear
a frequency count of all of the possible timbre clusters observed in a song Finally
as with the pitch data I divided all observed counts by the duration of the song in
order to normalize each songrsquos timbre counts
26
Chapter 3
Results
31 Methodology
After the pitch and timbre data was processed I ran the Dirichlet Process on the
data For each song I concatenated the 192-element chord change frequency list
and the 46-element timbre category frequency list giving each song a total of 238
features However there is a problem with this setup The pitch data will inherently
dominate the clustering process since it contains almost 3 times as many features
as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to
give different weights to each feature I considered another possibility to remedy
this discrepancy duplicating the timbre vector a certain number of times and
concatenating that to the feature set of each song While this strategy runs the risk
of corrupting the feature set and turning it into something that does not accurately
represent each song it is important to keep in mind that even without duplicating
the timbre vector the feature set consists of two separate feature sets concatenated
to each other Therefore timbre duplication appears to be a reasonable strategy to
weigh pitch and timbre more evenly
27
After this modification I tweaked a few more parameters before obtaining my
final results Dividing the pitch and timbre frequencies by the duration of the song
normalized every song to frequency per second but it also had the undesired effect
of making the data too small Timbre and pitch frequencies per second were almost
always less than 10 and many times hovered as low as 0002 for nonzero values
Because all of the values were very close to each other using common values of α
in the range of 01 to 1000-2000 was insufficient to push the songs into different
clusters As a result every song fell into the same cluster Increasing the value
of α by several orders of magnitude to well over 10 million fixed the problem but
this solution presented two problems First tuning α to experiment with different
ways to cluster the music would be problematic since I would have to work with
an enormous range of possible values for α Second pushing α to such high values
is not appropriate for the Dirichlet Process Extremely high values of α indicate a
Dirichlet Process that will try to disperse the data into different clusters but a value
of α that high is in principle always assigning each new song to a new cluster On
the other hand varying α between 01 and 1000 for example presents a much wider
range of flexibility when assigning clusters While this may be possible by varying
the values of α an extreme amount with the data as it currently is we are using
the Dirichlet Process in a way it should mathematically not be used Therefore
multiplying all of the data by a constant value so that we can work in the appropriate
range of α is the ideal approach After some experimentation I found that k=10 was
an appropriate scaling factor After initial runs of the Dirichlet Process I found out
that there was a slight issue with some of the earlier songs Since I had only artist
genre tags not specific song tags for each song I chose songs based on whether any
of the tags associated with the artist fell under any electronic music genre including
the generic term rsquoelectronicrsquo There were some bands mostly older ones from the
1960s and 1970s like Electric Light Orchestra which had some electronic music but
28
mostly featured rock funk disco or another genre Given that these artists featured
mostly non-electronic songs I decided to exclude them from my study and generate
a blacklist indicating these music artists While it was infeasible to look through
every single song and determine whether it was electronic or not I was able to look
over the earliest songs in each cluster These songs were the most important to verify
as electronic because early non-electronic songs could end up forming new clusters
and inadvertently create clusters with non-electronic sounds that I was not looking for
The goal of this thesis is to identify different groups in which EM songs are
clustered and identify the most unique artists and genres While the second task is
very simple because it requires looking at the earliest songs in each cluster the first
is difficult to gauge the effectiveness of While I can look at the average chord change
and timbre category frequencies in each category as well as other metadata putting
more semantic interpretations to what the music actually sounds like and determining
whether the music is clustered properly is a very subjective process For this reason
I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and
compared the clustering in each category examining similarities and differences in
the clusters formed in each scenario in the Discussion section For each value α I
set the upper limit of components or clusters allowed to 50 The ranges of α I used
resulted in 9 14 and 19 clusters formed
32 Findings
321 α=005
When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are
the distribution of years of the songs in each cluster (note that the Dirichlet Process
29
does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are
skipped)
30
Figure 31 Song year distributions for α = 005
For each value of α I also calculated the average frequency of each chord change
category and timbre category for each cluster and plotted the results The green
lines correspond to timbre and the blue lines to pitch
31
Figure 32 Timbre and pitch distributions for α = 005
A table of each cluster formed the number of songs in that cluster and descriptions
of pitch timbre and rhythmic qualities characteristic of songs in that cluster are
shown below
32
Cluster Song Count Characteristic Sounds
0 6481 Minimalist industrial space sounds dissonant chords
1 5482 Soft New Age ethereal
2 2405 Defined sounds electronic and non-electronic instru-
ments played in standard rock rhythms
3 360 Very dense and complex synths slightly darker tone
4 4550 Heavily distorted rock and synthesizer
6 2854 Faster paced 80s synth rock acid house
8 798 Aggressive beats dense house music
9 1464 Ambient house trancelike strong beats mysterious
tone
11 1597 Melancholy tones New wave rock in 80s then starting
in 90s downtempo trip-hop nu-metal
Table 31 Song cluster descriptions for α = 005
322 α=01
A total of 14 clusters were formed (16 were formed but 2 clusters contained only
one song each I listened to both of these songs and they did not sound unique so
I discarded them from the clusters) Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
33
34
Figure 33 Song year distributions for α = 01
35
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear
a frequency count of all of the possible timbre clusters observed in a song Finally
as with the pitch data I divided all observed counts by the duration of the song in
order to normalize each songrsquos timbre counts
26
Chapter 3
Results
31 Methodology
After the pitch and timbre data was processed I ran the Dirichlet Process on the
data For each song I concatenated the 192-element chord change frequency list
and the 46-element timbre category frequency list giving each song a total of 238
features However there is a problem with this setup The pitch data will inherently
dominate the clustering process since it contains almost 3 times as many features
as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to
give different weights to each feature I considered another possibility to remedy
this discrepancy duplicating the timbre vector a certain number of times and
concatenating that to the feature set of each song While this strategy runs the risk
of corrupting the feature set and turning it into something that does not accurately
represent each song it is important to keep in mind that even without duplicating
the timbre vector the feature set consists of two separate feature sets concatenated
to each other Therefore timbre duplication appears to be a reasonable strategy to
weigh pitch and timbre more evenly
27
After this modification I tweaked a few more parameters before obtaining my
final results Dividing the pitch and timbre frequencies by the duration of the song
normalized every song to frequency per second but it also had the undesired effect
of making the data too small Timbre and pitch frequencies per second were almost
always less than 10 and many times hovered as low as 0002 for nonzero values
Because all of the values were very close to each other using common values of α
in the range of 01 to 1000-2000 was insufficient to push the songs into different
clusters As a result every song fell into the same cluster Increasing the value
of α by several orders of magnitude to well over 10 million fixed the problem but
this solution presented two problems First tuning α to experiment with different
ways to cluster the music would be problematic since I would have to work with
an enormous range of possible values for α Second pushing α to such high values
is not appropriate for the Dirichlet Process Extremely high values of α indicate a
Dirichlet Process that will try to disperse the data into different clusters but a value
of α that high is in principle always assigning each new song to a new cluster On
the other hand varying α between 01 and 1000 for example presents a much wider
range of flexibility when assigning clusters While this may be possible by varying
the values of α an extreme amount with the data as it currently is we are using
the Dirichlet Process in a way it should mathematically not be used Therefore
multiplying all of the data by a constant value so that we can work in the appropriate
range of α is the ideal approach After some experimentation I found that k=10 was
an appropriate scaling factor After initial runs of the Dirichlet Process I found out
that there was a slight issue with some of the earlier songs Since I had only artist
genre tags not specific song tags for each song I chose songs based on whether any
of the tags associated with the artist fell under any electronic music genre including
the generic term rsquoelectronicrsquo There were some bands mostly older ones from the
1960s and 1970s like Electric Light Orchestra which had some electronic music but
28
mostly featured rock funk disco or another genre Given that these artists featured
mostly non-electronic songs I decided to exclude them from my study and generate
a blacklist indicating these music artists While it was infeasible to look through
every single song and determine whether it was electronic or not I was able to look
over the earliest songs in each cluster These songs were the most important to verify
as electronic because early non-electronic songs could end up forming new clusters
and inadvertently create clusters with non-electronic sounds that I was not looking for
The goal of this thesis is to identify different groups in which EM songs are
clustered and identify the most unique artists and genres While the second task is
very simple because it requires looking at the earliest songs in each cluster the first
is difficult to gauge the effectiveness of While I can look at the average chord change
and timbre category frequencies in each category as well as other metadata putting
more semantic interpretations to what the music actually sounds like and determining
whether the music is clustered properly is a very subjective process For this reason
I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and
compared the clustering in each category examining similarities and differences in
the clusters formed in each scenario in the Discussion section For each value α I
set the upper limit of components or clusters allowed to 50 The ranges of α I used
resulted in 9 14 and 19 clusters formed
32 Findings
321 α=005
When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are
the distribution of years of the songs in each cluster (note that the Dirichlet Process
29
does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are
skipped)
30
Figure 31 Song year distributions for α = 005
For each value of α I also calculated the average frequency of each chord change
category and timbre category for each cluster and plotted the results The green
lines correspond to timbre and the blue lines to pitch
31
Figure 32 Timbre and pitch distributions for α = 005
A table of each cluster formed the number of songs in that cluster and descriptions
of pitch timbre and rhythmic qualities characteristic of songs in that cluster are
shown below
32
Cluster Song Count Characteristic Sounds
0 6481 Minimalist industrial space sounds dissonant chords
1 5482 Soft New Age ethereal
2 2405 Defined sounds electronic and non-electronic instru-
ments played in standard rock rhythms
3 360 Very dense and complex synths slightly darker tone
4 4550 Heavily distorted rock and synthesizer
6 2854 Faster paced 80s synth rock acid house
8 798 Aggressive beats dense house music
9 1464 Ambient house trancelike strong beats mysterious
tone
11 1597 Melancholy tones New wave rock in 80s then starting
in 90s downtempo trip-hop nu-metal
Table 31 Song cluster descriptions for α = 005
322 α=01
A total of 14 clusters were formed (16 were formed but 2 clusters contained only
one song each I listened to both of these songs and they did not sound unique so
I discarded them from the clusters) Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
33
34
Figure 33 Song year distributions for α = 01
35
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
Chapter 3
Results
31 Methodology
After the pitch and timbre data was processed I ran the Dirichlet Process on the
data For each song I concatenated the 192-element chord change frequency list
and the 46-element timbre category frequency list giving each song a total of 238
features However there is a problem with this setup The pitch data will inherently
dominate the clustering process since it contains almost 3 times as many features
as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to
give different weights to each feature I considered another possibility to remedy
this discrepancy duplicating the timbre vector a certain number of times and
concatenating that to the feature set of each song While this strategy runs the risk
of corrupting the feature set and turning it into something that does not accurately
represent each song it is important to keep in mind that even without duplicating
the timbre vector the feature set consists of two separate feature sets concatenated
to each other Therefore timbre duplication appears to be a reasonable strategy to
weigh pitch and timbre more evenly
27
After this modification I tweaked a few more parameters before obtaining my
final results Dividing the pitch and timbre frequencies by the duration of the song
normalized every song to frequency per second but it also had the undesired effect
of making the data too small Timbre and pitch frequencies per second were almost
always less than 10 and many times hovered as low as 0002 for nonzero values
Because all of the values were very close to each other using common values of α
in the range of 01 to 1000-2000 was insufficient to push the songs into different
clusters As a result every song fell into the same cluster Increasing the value
of α by several orders of magnitude to well over 10 million fixed the problem but
this solution presented two problems First tuning α to experiment with different
ways to cluster the music would be problematic since I would have to work with
an enormous range of possible values for α Second pushing α to such high values
is not appropriate for the Dirichlet Process Extremely high values of α indicate a
Dirichlet Process that will try to disperse the data into different clusters but a value
of α that high is in principle always assigning each new song to a new cluster On
the other hand varying α between 01 and 1000 for example presents a much wider
range of flexibility when assigning clusters While this may be possible by varying
the values of α an extreme amount with the data as it currently is we are using
the Dirichlet Process in a way it should mathematically not be used Therefore
multiplying all of the data by a constant value so that we can work in the appropriate
range of α is the ideal approach After some experimentation I found that k=10 was
an appropriate scaling factor After initial runs of the Dirichlet Process I found out
that there was a slight issue with some of the earlier songs Since I had only artist
genre tags not specific song tags for each song I chose songs based on whether any
of the tags associated with the artist fell under any electronic music genre including
the generic term rsquoelectronicrsquo There were some bands mostly older ones from the
1960s and 1970s like Electric Light Orchestra which had some electronic music but
28
mostly featured rock funk disco or another genre Given that these artists featured
mostly non-electronic songs I decided to exclude them from my study and generate
a blacklist indicating these music artists While it was infeasible to look through
every single song and determine whether it was electronic or not I was able to look
over the earliest songs in each cluster These songs were the most important to verify
as electronic because early non-electronic songs could end up forming new clusters
and inadvertently create clusters with non-electronic sounds that I was not looking for
The goal of this thesis is to identify different groups in which EM songs are
clustered and identify the most unique artists and genres While the second task is
very simple because it requires looking at the earliest songs in each cluster the first
is difficult to gauge the effectiveness of While I can look at the average chord change
and timbre category frequencies in each category as well as other metadata putting
more semantic interpretations to what the music actually sounds like and determining
whether the music is clustered properly is a very subjective process For this reason
I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and
compared the clustering in each category examining similarities and differences in
the clusters formed in each scenario in the Discussion section For each value α I
set the upper limit of components or clusters allowed to 50 The ranges of α I used
resulted in 9 14 and 19 clusters formed
32 Findings
321 α=005
When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are
the distribution of years of the songs in each cluster (note that the Dirichlet Process
29
does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are
skipped)
30
Figure 31 Song year distributions for α = 005
For each value of α I also calculated the average frequency of each chord change
category and timbre category for each cluster and plotted the results The green
lines correspond to timbre and the blue lines to pitch
31
Figure 32 Timbre and pitch distributions for α = 005
A table of each cluster formed the number of songs in that cluster and descriptions
of pitch timbre and rhythmic qualities characteristic of songs in that cluster are
shown below
32
Cluster Song Count Characteristic Sounds
0 6481 Minimalist industrial space sounds dissonant chords
1 5482 Soft New Age ethereal
2 2405 Defined sounds electronic and non-electronic instru-
ments played in standard rock rhythms
3 360 Very dense and complex synths slightly darker tone
4 4550 Heavily distorted rock and synthesizer
6 2854 Faster paced 80s synth rock acid house
8 798 Aggressive beats dense house music
9 1464 Ambient house trancelike strong beats mysterious
tone
11 1597 Melancholy tones New wave rock in 80s then starting
in 90s downtempo trip-hop nu-metal
Table 31 Song cluster descriptions for α = 005
322 α=01
A total of 14 clusters were formed (16 were formed but 2 clusters contained only
one song each I listened to both of these songs and they did not sound unique so
I discarded them from the clusters) Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
33
34
Figure 33 Song year distributions for α = 01
35
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
After this modification I tweaked a few more parameters before obtaining my
final results Dividing the pitch and timbre frequencies by the duration of the song
normalized every song to frequency per second but it also had the undesired effect
of making the data too small Timbre and pitch frequencies per second were almost
always less than 10 and many times hovered as low as 0002 for nonzero values
Because all of the values were very close to each other using common values of α
in the range of 01 to 1000-2000 was insufficient to push the songs into different
clusters As a result every song fell into the same cluster Increasing the value
of α by several orders of magnitude to well over 10 million fixed the problem but
this solution presented two problems First tuning α to experiment with different
ways to cluster the music would be problematic since I would have to work with
an enormous range of possible values for α Second pushing α to such high values
is not appropriate for the Dirichlet Process Extremely high values of α indicate a
Dirichlet Process that will try to disperse the data into different clusters but a value
of α that high is in principle always assigning each new song to a new cluster On
the other hand varying α between 01 and 1000 for example presents a much wider
range of flexibility when assigning clusters While this may be possible by varying
the values of α an extreme amount with the data as it currently is we are using
the Dirichlet Process in a way it should mathematically not be used Therefore
multiplying all of the data by a constant value so that we can work in the appropriate
range of α is the ideal approach After some experimentation I found that k=10 was
an appropriate scaling factor After initial runs of the Dirichlet Process I found out
that there was a slight issue with some of the earlier songs Since I had only artist
genre tags not specific song tags for each song I chose songs based on whether any
of the tags associated with the artist fell under any electronic music genre including
the generic term rsquoelectronicrsquo There were some bands mostly older ones from the
1960s and 1970s like Electric Light Orchestra which had some electronic music but
28
mostly featured rock funk disco or another genre Given that these artists featured
mostly non-electronic songs I decided to exclude them from my study and generate
a blacklist indicating these music artists While it was infeasible to look through
every single song and determine whether it was electronic or not I was able to look
over the earliest songs in each cluster These songs were the most important to verify
as electronic because early non-electronic songs could end up forming new clusters
and inadvertently create clusters with non-electronic sounds that I was not looking for
The goal of this thesis is to identify different groups in which EM songs are
clustered and identify the most unique artists and genres While the second task is
very simple because it requires looking at the earliest songs in each cluster the first
is difficult to gauge the effectiveness of While I can look at the average chord change
and timbre category frequencies in each category as well as other metadata putting
more semantic interpretations to what the music actually sounds like and determining
whether the music is clustered properly is a very subjective process For this reason
I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and
compared the clustering in each category examining similarities and differences in
the clusters formed in each scenario in the Discussion section For each value α I
set the upper limit of components or clusters allowed to 50 The ranges of α I used
resulted in 9 14 and 19 clusters formed
32 Findings
321 α=005
When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are
the distribution of years of the songs in each cluster (note that the Dirichlet Process
29
does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are
skipped)
30
Figure 31 Song year distributions for α = 005
For each value of α I also calculated the average frequency of each chord change
category and timbre category for each cluster and plotted the results The green
lines correspond to timbre and the blue lines to pitch
31
Figure 32 Timbre and pitch distributions for α = 005
A table of each cluster formed the number of songs in that cluster and descriptions
of pitch timbre and rhythmic qualities characteristic of songs in that cluster are
shown below
32
Cluster Song Count Characteristic Sounds
0 6481 Minimalist industrial space sounds dissonant chords
1 5482 Soft New Age ethereal
2 2405 Defined sounds electronic and non-electronic instru-
ments played in standard rock rhythms
3 360 Very dense and complex synths slightly darker tone
4 4550 Heavily distorted rock and synthesizer
6 2854 Faster paced 80s synth rock acid house
8 798 Aggressive beats dense house music
9 1464 Ambient house trancelike strong beats mysterious
tone
11 1597 Melancholy tones New wave rock in 80s then starting
in 90s downtempo trip-hop nu-metal
Table 31 Song cluster descriptions for α = 005
322 α=01
A total of 14 clusters were formed (16 were formed but 2 clusters contained only
one song each I listened to both of these songs and they did not sound unique so
I discarded them from the clusters) Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
33
34
Figure 33 Song year distributions for α = 01
35
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
mostly featured rock funk disco or another genre Given that these artists featured
mostly non-electronic songs I decided to exclude them from my study and generate
a blacklist indicating these music artists While it was infeasible to look through
every single song and determine whether it was electronic or not I was able to look
over the earliest songs in each cluster These songs were the most important to verify
as electronic because early non-electronic songs could end up forming new clusters
and inadvertently create clusters with non-electronic sounds that I was not looking for
The goal of this thesis is to identify different groups in which EM songs are
clustered and identify the most unique artists and genres While the second task is
very simple because it requires looking at the earliest songs in each cluster the first
is difficult to gauge the effectiveness of While I can look at the average chord change
and timbre category frequencies in each category as well as other metadata putting
more semantic interpretations to what the music actually sounds like and determining
whether the music is clustered properly is a very subjective process For this reason
I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and
compared the clustering in each category examining similarities and differences in
the clusters formed in each scenario in the Discussion section For each value α I
set the upper limit of components or clusters allowed to 50 The ranges of α I used
resulted in 9 14 and 19 clusters formed
32 Findings
321 α=005
When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are
the distribution of years of the songs in each cluster (note that the Dirichlet Process
29
does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are
skipped)
30
Figure 31 Song year distributions for α = 005
For each value of α I also calculated the average frequency of each chord change
category and timbre category for each cluster and plotted the results The green
lines correspond to timbre and the blue lines to pitch
31
Figure 32 Timbre and pitch distributions for α = 005
A table of each cluster formed the number of songs in that cluster and descriptions
of pitch timbre and rhythmic qualities characteristic of songs in that cluster are
shown below
32
Cluster Song Count Characteristic Sounds
0 6481 Minimalist industrial space sounds dissonant chords
1 5482 Soft New Age ethereal
2 2405 Defined sounds electronic and non-electronic instru-
ments played in standard rock rhythms
3 360 Very dense and complex synths slightly darker tone
4 4550 Heavily distorted rock and synthesizer
6 2854 Faster paced 80s synth rock acid house
8 798 Aggressive beats dense house music
9 1464 Ambient house trancelike strong beats mysterious
tone
11 1597 Melancholy tones New wave rock in 80s then starting
in 90s downtempo trip-hop nu-metal
Table 31 Song cluster descriptions for α = 005
322 α=01
A total of 14 clusters were formed (16 were formed but 2 clusters contained only
one song each I listened to both of these songs and they did not sound unique so
I discarded them from the clusters) Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
33
34
Figure 33 Song year distributions for α = 01
35
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are
skipped)
30
Figure 31 Song year distributions for α = 005
For each value of α I also calculated the average frequency of each chord change
category and timbre category for each cluster and plotted the results The green
lines correspond to timbre and the blue lines to pitch
31
Figure 32 Timbre and pitch distributions for α = 005
A table of each cluster formed the number of songs in that cluster and descriptions
of pitch timbre and rhythmic qualities characteristic of songs in that cluster are
shown below
32
Cluster Song Count Characteristic Sounds
0 6481 Minimalist industrial space sounds dissonant chords
1 5482 Soft New Age ethereal
2 2405 Defined sounds electronic and non-electronic instru-
ments played in standard rock rhythms
3 360 Very dense and complex synths slightly darker tone
4 4550 Heavily distorted rock and synthesizer
6 2854 Faster paced 80s synth rock acid house
8 798 Aggressive beats dense house music
9 1464 Ambient house trancelike strong beats mysterious
tone
11 1597 Melancholy tones New wave rock in 80s then starting
in 90s downtempo trip-hop nu-metal
Table 31 Song cluster descriptions for α = 005
322 α=01
A total of 14 clusters were formed (16 were formed but 2 clusters contained only
one song each I listened to both of these songs and they did not sound unique so
I discarded them from the clusters) Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
33
34
Figure 33 Song year distributions for α = 01
35
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
Figure 31 Song year distributions for α = 005
For each value of α I also calculated the average frequency of each chord change
category and timbre category for each cluster and plotted the results The green
lines correspond to timbre and the blue lines to pitch
31
Figure 32 Timbre and pitch distributions for α = 005
A table of each cluster formed the number of songs in that cluster and descriptions
of pitch timbre and rhythmic qualities characteristic of songs in that cluster are
shown below
32
Cluster Song Count Characteristic Sounds
0 6481 Minimalist industrial space sounds dissonant chords
1 5482 Soft New Age ethereal
2 2405 Defined sounds electronic and non-electronic instru-
ments played in standard rock rhythms
3 360 Very dense and complex synths slightly darker tone
4 4550 Heavily distorted rock and synthesizer
6 2854 Faster paced 80s synth rock acid house
8 798 Aggressive beats dense house music
9 1464 Ambient house trancelike strong beats mysterious
tone
11 1597 Melancholy tones New wave rock in 80s then starting
in 90s downtempo trip-hop nu-metal
Table 31 Song cluster descriptions for α = 005
322 α=01
A total of 14 clusters were formed (16 were formed but 2 clusters contained only
one song each I listened to both of these songs and they did not sound unique so
I discarded them from the clusters) Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
33
34
Figure 33 Song year distributions for α = 01
35
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
Figure 32 Timbre and pitch distributions for α = 005
A table of each cluster formed the number of songs in that cluster and descriptions
of pitch timbre and rhythmic qualities characteristic of songs in that cluster are
shown below
32
Cluster Song Count Characteristic Sounds
0 6481 Minimalist industrial space sounds dissonant chords
1 5482 Soft New Age ethereal
2 2405 Defined sounds electronic and non-electronic instru-
ments played in standard rock rhythms
3 360 Very dense and complex synths slightly darker tone
4 4550 Heavily distorted rock and synthesizer
6 2854 Faster paced 80s synth rock acid house
8 798 Aggressive beats dense house music
9 1464 Ambient house trancelike strong beats mysterious
tone
11 1597 Melancholy tones New wave rock in 80s then starting
in 90s downtempo trip-hop nu-metal
Table 31 Song cluster descriptions for α = 005
322 α=01
A total of 14 clusters were formed (16 were formed but 2 clusters contained only
one song each I listened to both of these songs and they did not sound unique so
I discarded them from the clusters) Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
33
34
Figure 33 Song year distributions for α = 01
35
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
Cluster Song Count Characteristic Sounds
0 6481 Minimalist industrial space sounds dissonant chords
1 5482 Soft New Age ethereal
2 2405 Defined sounds electronic and non-electronic instru-
ments played in standard rock rhythms
3 360 Very dense and complex synths slightly darker tone
4 4550 Heavily distorted rock and synthesizer
6 2854 Faster paced 80s synth rock acid house
8 798 Aggressive beats dense house music
9 1464 Ambient house trancelike strong beats mysterious
tone
11 1597 Melancholy tones New wave rock in 80s then starting
in 90s downtempo trip-hop nu-metal
Table 31 Song cluster descriptions for α = 005
322 α=01
A total of 14 clusters were formed (16 were formed but 2 clusters contained only
one song each I listened to both of these songs and they did not sound unique so
I discarded them from the clusters) Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
33
34
Figure 33 Song year distributions for α = 01
35
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
34
Figure 33 Song year distributions for α = 01
35
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
Figure 33 Song year distributions for α = 01
35
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
36
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
Figure 34 Timbre and pitch distributions for α = 01
37
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
Cluster Song Count Characteristic Sounds
0 1339 Instrumental and disco with 80s synth
1 2109 Simultaneous quarter-note and sixteenth note rhythms
2 4048 Upbeat chill simultaneous quarter-note and eighth
note rhythms
3 1353 Strong repetitive beats ambient
4 2446 Strong simultaneous beat and synths synths defined but
echo
5 2672 Calm New Age
6 542 Hi-hat cymbals dissonant chord progressions
7 2725 Aggressive punk and alternative rock
9 1647 Latin rhythmic emphasis on first and third beats
11 835 Standard medium-fast rock instrumentschords
16 1152 Orchestral especially violins
18 40 ldquoMartian alienrdquo sounds no vocals
20 1590 Alternating strong kick and strong high-pitched clap
28 528 Roland TR-like beats kick and clap stand out but fuzzy
Table 32 Song cluster descriptions for α = 01
323 α=02
With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted
of 1 song each none of which were particularly unique-sounding so I discarded them
for a total of 19 significant clusters Again the song distributions timbre and pitch
distributions and cluster descriptions are shown below
38
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
39
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
40
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
Figure 35 Song year distributions for α = 02
41
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
42
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
43
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
Figure 36 Timbre and pitch distributions for α = 02
44
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
Cluster Song Count Characteristic Sounds
0 4075 Nostalgic and sad-sounding synths and string instru-
ments
1 2068 Intense sad cavernous (mix of industrial metal and am-
bient)
2 1546 Jazzfunk tones
3 1691 Orchestral with heavy 80s synths atmospheric
4 343 Arpeggios
5 304 Electro ambient
6 2405 Alien synths eery
7 1264 Punchy kicks and claps 80s90s tilt
8 1561 Medium tempo 44 time signature synths with intense
guitar
9 1796 Disco rhythms and instruments
10 2158 Standard rock with few (if any) synths added on
12 791 Cavernous minimalist ambient (non-electronic instru-
ments)
14 765 Downtempo classic guitar riffs fewer synths
16 865 Classic acid house sounds and beats
17 682 Heavy Roland TR sounds
22 14 Fast ambient classic orchestral
23 578 Acid house with funk tones
30 31 Very repetitive rhythms one or two tones
34 88 Very dense sound (strong vocals and synths)
Table 33 Song cluster descriptions for α = 02
45
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
33 Analysis
Each of the different values of α revealed different insights about the Dirichlet Process
applied to the Million Song Dataset mainly which artists and songs were unique
and how traditional groupings of EM genres could be thought of differently Not
surprisingly the distributions of the years of songs in most of the clusters were skewed
to the left because the distribution of all of the EM songs in the Million Song Dataset
is left-skewed (see Figure 22) However some of the distributions vary significantly
for individual clusters and these differences provide important insights into which
types of music styles were popular at certain points in time and how unique the
earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos
musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two
programmable drum machines and synthesizers that became extremely popular in
1980s and 90s dance tracks) [13] coincides with the when the instruments were first
manufactured in 1980 Not surprisingly this cluster contained mostly songs from
the 80s and 90s and declined slightly in the 2000s However there were a few songs
in that cluster that came out before 1980 While these songs did not clearly use the
Roland TR machines they may have contained similar sounds that predated the
machines and were truly novel
First looking at α = 005 we see that all of the clusters contain a significant
number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains
a heavier left tail indicating a larger number of songs from the 70s 80s and 90s
Inside the cluster the genres of music varied significantly from a traditional music
lens That is the cluster contained some songs with nearly all traditional rock
instruments others with purely synths and others somewhere in between all which
would normally be classified as different EM genres However under the Dirichlet
Process these songs were lumped together with the common themes of dense melodic
46
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
melodies (as opposed to minimalistic repetitive or dissonant sounds) The most
prominent artists from the earlier songs are Ashra and John Hassell who composed
several melodic songs combining traditional instruments with synthesizers for a
modern feel The other small cluster number 8 contains a more normal year
distribution relative to the entire MSD distribution and also consists of denser beats
Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly
because it contains virtually no songs before 1990 but increases rapidly in popularity
This cluster contains songs with hypnotically repetitive rhythm strong and ethereal
synths and an equally strong drum-like beat Given the emergence of trance in
the 1990s and the fact that house music in the 1980s contained more minimalistic
synths than house music in the 1990s this distribution of years makes sense Looking
at the earliest artists in this cluster one that accurately predates the later music
in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and
electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very
sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more
drawn-out and ethereal synths While the song sounds ambient at its normal speed
playing the song 15 times the normal speed resulted in a thumping fast-paced 16th
note rhythm that combined with the ethereal synths that contain certain chord
progressions sounded very similar to trance music In fact I found that stylistically
trance music was comparable to house and ambient music increased in speed Trance
music was a term not used extensively until the early 1990s but ambient and house
music were already mainstream by the 1980s so it would make sense that trance
evolved in this manner However this insight could serve as an argument that trance
is not an innovative genre in and of itself but is rather a clever combination of two
older genres Lastly we look at the timbre category and chord change distributions
for each cluster In theory these clusters should have significantly different peaks
of chord changes and timbre categories reflecting different pitch arrangements and
47
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
instruments in each cluster The type 0 chord change corresponds to major rarr
major with no note change type 60 minor rarr minor with no note change type
120 dominant 7th major rarr dominant 7th major with no note change and type 180
dominant 7th minor rarr dominant 7th minor with no note change It makes sense that
type 0 60 120 and 180 chord changes are frequently observed because it implies
that chords in the song occurring next to each other are remaining in the same key
for the majority of the song The timbre categories on the other hand are more
difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling
songs and sounds that are the closest to each timbre category and then playing the
sounds and attaching user-based interpretations based on several listeners [8] While
this strategy worked in Mauchrsquos study given the time and resources at my disposal
this strategy was not practical in my study I ended up comparing my subjective
summaries of each cluster and comparing the charts to see whether certain peaks
in the timbre categories corresponded to specific tones Strangely for α = 005 the
timbre and chord change data is very similar for each cluster This problem does not
occur for when α = 01 or 02 where the graphs vary significantly and correspond
to some of the observed differences in the music In summary below are the most
influential artists I found in the clusters formed and the types of music they created
that were novel for their time
bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-
ments
bull Cabaret Voltaire orchestral electronic music
bull Paul Horn new age
bull Brian Eno ambient music
bull Manuel Goumlttsching (Ashra) synth-heavy ambient music
48
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
bull Killing Joke industrial metal
bull John Foxx minimalist and dark electronic music
bull Fad Gadget house and industrial music
While these conclusions were formed mainly from the data used in the MSD and the
resulting clusters I checked outside sources and biographies of these artists to see
whether they were groundbreaking contributors to electronic music Some research
revealed that these artists were indeed groundbreaking for their time so my findings
are consistent with existing literature The difference however between existing
accounts and mine is that from a quantitatively computed perspective I found some
new connections (like observing that one of Jarrersquos works when sped up sounded
very similar to trance music)
For larger values of α it is not only worth looking at interesting phenomena in
the clusters formed for that specific value but also comparing the clusters formed
to other values of α Since we are increasing the value of α more clusters will
be formed and the distinctions between each cluster will be more nuanced With
α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of
only one song each and upon listening neither of these songs sounded particularly
unique so I threw those two clusters out and analyzed the remaining 14 Comparing
these clusters to the ones formed with α = 005 I found that some of the clusters
mapped over nicely while others were more difficult to interpret For example cluster
301 (cluster 3 when α = 01) contained a similar number of songs and a similar
distribution of the years the songs were released to cluster 9005 Both contain vitually
no songs before the 1990s and then steadily rise in popularity through the 2000s
Both clusters also contain similar types of music house beats and ethreal synths
reminiscent of ambient or trance music However when I looked at the earliest
49
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
artists in cluster 301 they were different from the earliest artists in cluster 9005
One particular artist Bill Nelson stood out for having a particularly novel song
ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and
twangy synth beat that when sped up sounded like minimalist acid house music
While the α = 005 group differentiated mostly on general moods and classes of
instruments (like rock vs non-electronic vs electronic) the α = 01 group picked
up more nuanced instrumentation and mood differences For example cluster 1601
contained songs that featured orchestral string instruments especially violin The
songs themselves varied significantly according to traditional genres from Brian Eno
arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal
band which contained violin interludes This clustering raises an interesting point
that music that sounds very different based on traditional genres could be grouped
together on certain instruments or sounds Another cluster 2801 features 90s sounds
characteristic of the Roland TR-909 drum machine (which would explain why the
clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through
the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating
a style more popular in the 1980s and the characteristic sound high-hat cymbals
is also a specialized instrument This specialization does not match up particularly
strongly with the clusters when α = 005 That is a single cluster with α = 005
does not easily map to one or more clusters in the α = 01 run although many of
the clusters appear to share characteristics based on the qualitative descriptions in
the tables The timbrechord change charts for each cluster appeared to at least
somewhat corroborate the general characteristics I attached to each cluster For
example the last timbre category is significantly pronounced for clusters 5 and 18
and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds
so it would make sense that cluster 5 which was mainly calm New World also
contained vocal-free ethereal and space-y sounds It was also interesting to note
50
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
that certain clusters like 28 contained one timbre category that completely dominated
all the others Many songs in this cluster were marked by strong and repetitive
beats reminiscent of the Roland TR synth drum machine which matches the graph
Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre
category were noted for containing strong and repetitive beats For this cluster I
added the following artists and their contributions to the general list of novel artists
bull Bill Nelson Minimalist house music
bull Vangelis Orchestral compositions with electronic notes
bull Rick Wakeman Rock compositions with spacy-sounding synths
bull Kraftwerk synth-based pop music
Finally we look at α = 02 With this parameter value the Dirichlet Process resulted
in 22 clusters formed 3 of these clusters contained only one song each and upon
listening to each of these songs I determined they were not particularly unique and
discarded them for a total of 19 remaining clusters Unlike the previous two values of
α where the clusters were relatively easy to subjectively differentiate this one was
quite difficult Slightly more than half of the clusters 10 out of 19 contained under
1000 songs and 3 contained under 100 Some of the clusters were easily mapped
to clusters in the other two α values like cluster 1702 which contains Roland TR
drum machine sounds and is comparable to cluster 2801 However many of the other
classifications seemed more dubious Not only did the songs within each cluster seem
to often vary significantly but the differences between many clusters appeared nearly
indistinguishable The chord change and timbre charts also support the difficulty
in distinguishing different clusters The y-axes for all of the years are quite small
implying that many of the timbre values averaged out because the songs were quite
different in each cluster Essentially the observations are quite noisy and do not have
51
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
features that stand out as saliently as cluster 2801 for example The only exceptions
to these numbers were clusters 30 and 34 but there were so few songs in each of
these clusters that they represent only a small amount of the dataset Therefore I
concluded that the Dirichlet Process with α = 02 performed an insufficient job of
adequately clustering the songs Overall the clusters formed when α = 01 were
the most meaningful in terms of picking up nuanced moods and instruments without
splitting hairs and resulting in clusters a minimally trained ear could not differentiate
From this analysis the most appropriate genre classifications of the electronic music
from the MSD are the clusters described in the table where α = 01 and the most
novel artists along with their contributions are summarized in the finidings where
α = 005 and α = 01
52
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
Chapter 4
Conclusion
In this chapter I first address weakness in my experiment and strategies to address
those weaknesses then I offer potential paths for researchers to build upon my ex-
periment and offer closing words regarding this thesis
41 Design Flaws in Experiment
While I made every effort possible to ensure the integrity of this experiment there
were various factors some beyond my control and others within my control but
unrealistic given the time and resources I had The largest issue was the dataset I
was working with While the MSD contained roughly 23000 electronic music songs
according to my classifications these songs did not come close to all of the electronic
music that was available From looking through the tracks I did see many important
artists meaning that there was some credibility to the dataset However there were
several other artists I was surprised to see missing and the artists included contained
only a limited number of popular songs Some traditionally defined genres like
dubstep were missing entirely from the dataset and the most recent songs came
from the year 2010 which meant that the past 5 years of rapid expansions in EM
were not accounted for Building a sufficient corpus of EM data is very difficult
53
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
arguably more than for other genres because songs may be remixed by multiple
artists further blurring the line between original content and modifications For this
reason I considered my thesis to be a proof of concept Although the data I used
may not be ideal I was able to show that the Dirichlet Process could be used with
some amount of success to cluster songs based on their metadata
With respect to how I implemented the Dirichlet Process and constructed the
features my methodology could have been more extensive with additional time and
resources Interpreting the sounds in each song and establishing common threads is a
difficult task and unlike Pandora which used trained music theory experts to analyze
each song I relied on my own ears and anecdotal knowledge of EM Given the lack of
formal literature quantitatively analyzing EM and the resources I had this was my
best realistic option but was also not ideal The second notable weakness which was
more controllable was determining what exactly constitutes an EM song My criteria
involved iterating through every song and selecting those whose artist contained a
tag that fell inside a list of predetermined EM genres However this strategy is not
always effective since some artists contain only a small selection of EM songs and
have produced much more music involving rock or other non-EM genres To prevent
these songs from appearing in the dataset I would need to load another dataset
from a group called Lastfm which contains user-generated tags at the song level
Another more addressable weakness in my experiment was graphically analyzing the
timbre categories While the average chord changes were easy to interpret on the
graphs for each cluster and had easy semantic interpretations the timbre categories
were never formally defined That is while I knew the Bayes Information Criterion
was lowest when there were 46 categories I did not associate each timbre category
with a sound Mauchrsquos study addressed this issue by randomly selecting songs with
sounds that fell in each timbre category and asked users to listen to the sounds and
54
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
classify what they heard Implementing this system would be an additional way of
ensuring that the clusters formed for each song were nontrivial I could not only
eyeball the measurements on each graph for timbre like I did in this thesis but also
use them to confirm the sounds I observed for each cluster Finally while my feature
selection contained careful preprocessing based on other studies that normalized
measurements between all songs there are additional ways I could have improved the
feature set For example one study looks at more advanced ways to isolate specific
timbre segments in a song identify repeating patterns and comparing songs to each
other in terms of the similarity of their timbres [15] More advanced methods like
these would allow me to more quantitatively analyze how successful the Dirichlet
Process is on effectively clustering songs into distinct categories
42 Future Work
Future work in this area quantitatively analyzing EM metadata to determine what
constitutes different genres and novel artists would involve tighter definitions proce-
dures evaluations of whether clustering was effective and music scrutiny All of the
weaknesses mentioned in the previous section barring perhaps the songs available in
the Million Song Dataset can be addressed with extensions and modifications to the
code base I created Addressing the greater issue of building an effective corpus of
music data for the MSD and constantly updating it might be addressed by soliciting
such data from an organization like Spotify but such an endeavor is very ambitious
and beyond the scope of any individual or small group research project without ex-
tensive funding and influence Once these problems are resolved and the dataset
songs accessed from the dataset and methods for comparing songs to each other are
accomplished the next steps would be to further analyze the results How do the
most unique artists for their time compare to the most popular artists Is there con-
55
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
siderable overlap How long does it take for a style to grow in popularity if it even
does And lastly how can these findings be used to compose new genres of music and
envision who and what will become popular in the future All of these questions may
require supplementary information sources with respect to the popularity of songs
and artists for example and many of these additional pieces of information can be
found on the website of the MSD
43 Closing Remarks
While this thesis is an ambitious endeavor and can be improved in many respects it
does show that the methods implemented yield nontrivial results and could serve as
a foundation for future quantitative analysis of electronic music As data analytics
grows even more and groups such as Spotify amass greater amounts of information
and deeper insights on that information this relatively new field of study will hope-
fully grow EM is a dynamic energizing and incredibly expressive type of music
and understanding it from a quantitative perspective pays respect to what has up
until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively
described but not examined as thoroughly from a mathematical angle
56
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
Appendix A
Code
A1 Pulling Data from the Million Song Dataset
1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7
8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)
10
11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
12
13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])
14 ext = rsquoh5rsquo15
16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo
17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo
18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo
19 rsquodance and electronicarsquorsquoelectronicrsquo]20
21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26
27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files
57
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in
target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time
time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)
item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)
tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)
tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1
secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50
51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))
52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo
53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))
A2 Calculating Most Likely Chords and Timbre
Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15
16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)
58
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
18
19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22
23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo
24
25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28
29 json_contents = open(input_filersquorrsquo)read()30
31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38
39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43
44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))
smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(
smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time
segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in
segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-
time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1
59
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]
73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)
74
75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))
76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]
77 calculate mean frequency of each note over a block of 5 timesegments
78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time
time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg
in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t
in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186
87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)
88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90
91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)
A3 Code to Compute Timbre Categories
1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture
10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20
21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25
1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985
60
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)
24
25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N
year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1
seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)
42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo
timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in
timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(
edm_textfiletimetime()-time_start)51
52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))
A4 Helper Methods for Calculations
1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8
9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10
11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural
12
61
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37
38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]
39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]
40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]
41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]
42
43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]
44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]
45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]
46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]
47
48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01
62
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]
100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00
63
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00
64
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232
233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235
236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237
238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in
segments_pitches]242 return segments_pitches_new243
65
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250
251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo
252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR
CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR
CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7
CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7
CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((
stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285
286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS
TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)
(npstd(timbre_vector)+001))
66
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat
67
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
Bibliography
[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012
[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide
[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014
[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014
[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014
[6] About the music genome project httpwwwpandoracomaboutmgp
[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012
[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015
[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011
[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011
[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from
68
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-
the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013
[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012
[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014
[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999
[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005
69
- Abstract
- Acknowledgements
- Contents
- List of Tables
- List of Figures
- 1 Introduction
-
- 11 Background Information
- 12 Literature Review
- 13 The Dataset
-
- 2 Mathematical Modeling
-
- 21 Determining Novelty of Songs
- 22 Feature Selection
- 23 Collecting Data and Preprocessing Selected Features
-
- 231 Collecting the Data
- 232 Pitch Preprocessing
- 233 Timbre Preprocessing
-
- 3 Results
-
- 31 Methodology
- 32 Findings
-
- 321 =005
- 322 =01
- 323 =02
-
- 33 Analysis
-
- 4 Conclusion
-
- 41 Design Flaws in Experiment
- 42 Future Work
- 43 Closing Remarks
-
- A Code
-
- A1 Pulling Data from the Million Song Dataset
- A2 Calculating Most Likely Chords and Timbre Categories
- A3 Code to Compute Timbre Categories
- A4 Helper Methods for Calculations
-
- Bibliography
-