dynamic nonparametric bayesian models for analysis of musiclcarin/dhdp_music.pdf · the analysis of...

24
Dynamic Nonparametric Bayesian Models for Analysis of Music Lu Ren 1 , David Dunson 2 , Scott Lindroth 3 and Lawrence Carin 1 1 Department of Electrical and Computer Engineering 2 Department of Statistical Science 3 Department of Music Duke University Durham, NC, 27708 Email: {lr,lcarin}@ee.duke.edu, [email protected], [email protected] Abstract The dynamic hierarchical Dirichlet process (dHDP) is developed to model complex sequential data, with a focus on audio signals from music. The music is represented in terms of a sequence of discrete observations, and the sequence is modeled using a hidden Markov model (HMM) with time-evolving parameters. The dHDP imposes the belief that observations that are temporally proximate are more likely to be drawn from HMMs with similar parameters, while also allowing for “innovation” associated with abrupt changes in the music texture. The sharing mechanisms of the time-evolving model are derived, and for inference a relatively simple Markov chain Monte Carlo sampler is developed. Segmentation of a given musical piece is constituted via the model inference. Detailed examples are presented on several pieces, with comparisons to other models. The dHDP results are also compared with a conventional music-theoretic analysis. Key Words: dynamic Dirichlet process; Hidden Markov Model; Mixture Model; Segmentation; Sequential data; Timeseries 1. Introduction The analysis of music is of interest to music theorists, for aiding in music teaching, for analysis of human perception of sounds (Temperley, 2008), and for design of music search and organization tools (Ni et al., 2008). An example of the use of Bayesian techniques for analyzing music may be found in (Temperley, 2007). However, in (Temperley, 2007) it is generally assumed that the user has access to MIDI files (musical instrument digital interface), which means that the analyst knows exactly what notes are sounding when. In the work presented here we are interested in processing the acoustic waveform directly; while the techniques developed here are of interest for music, they are also applicable for analysis of general acoustic waveforms. For example, a related problem which may be addressed using tools of the type developed here is the segmentation of audio waveforms for automatic speech and 1

Upload: others

Post on 25-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dynamic Nonparametric Bayesian Models for Analysis of Musiclcarin/dHDP_music.pdf · The analysis of music is of interest to music theorists, for aiding in music teaching, for analysis

Dynamic Nonparametric Bayesian Modelsfor Analysis of Music

Lu Ren1, David Dunson2, Scott Lindroth3 and Lawrence Carin1

1Department of Electrical and Computer Engineering2Department of Statistical Science

3Department of MusicDuke University

Durham, NC, 27708Email: lr,[email protected], [email protected], [email protected]

AbstractThe dynamic hierarchical Dirichlet process (dHDP) is developed to model complexsequential data, with a focus on audio signals from music. The music is representedin terms of a sequence of discrete observations, and the sequence is modeled using ahidden Markov model (HMM) with time-evolving parameters. The dHDP imposesthe belief that observations that are temporally proximate are more likely to bedrawn from HMMs with similar parameters, while also allowing for “innovation”associated with abrupt changes in the music texture. The sharing mechanisms ofthe time-evolving model are derived, and for inference a relatively simple Markovchain Monte Carlo sampler is developed. Segmentation of a given musical piece isconstituted via the model inference. Detailed examples are presented on severalpieces, with comparisons to other models. The dHDP results are also comparedwith a conventional music-theoretic analysis.

Key Words: dynamic Dirichlet process; Hidden Markov Model; Mixture Model;Segmentation; Sequential data; Timeseries

1. Introduction

The analysis of music is of interest to music theorists, for aiding in music teaching, foranalysis of human perception of sounds (Temperley, 2008), and for design of musicsearch and organization tools (Ni et al., 2008). An example of the use of Bayesiantechniques for analyzing music may be found in (Temperley, 2007). However, in(Temperley, 2007) it is generally assumed that the user has access to MIDI files(musical instrument digital interface), which means that the analyst knows exactlywhat notes are sounding when. In the work presented here we are interested inprocessing the acoustic waveform directly; while the techniques developed here are ofinterest for music, they are also applicable for analysis of general acoustic waveforms.For example, a related problem which may be addressed using tools of the typedeveloped here is the segmentation of audio waveforms for automatic speech and

1

Page 2: Dynamic Nonparametric Bayesian Models for Analysis of Musiclcarin/dHDP_music.pdf · The analysis of music is of interest to music theorists, for aiding in music teaching, for analysis

speaker recognition (e.g., for labeling different speakers in a teleconference (Fox et al.,2008)).

To motivate the work that follows, we start by considering a relatively well knownmusical piece: “A Day in the Life” from the Beatle’s album Sgt. Peppers LonelyHearts Club Band. The piece is 5 minutes and 33 seconds long, and the entire audiowaveform is plotted in Figure 1. To process these data, the acoustic signal wassampled at 22.05 KHz and divided into 100 ms contiguous frames. Mel frequencycepstral coefficients (MFCCs) (Logan, 2000) were extracted from each frame, thesebeing effective for extracting perceptually important parts of the spectral envelopeof audio signals (Jensen et al., 2006). The MFCC features are linked to spectralcharacteristics of the signal over the 100 ms window, and this mapping yields a40-dimensional vector of real numbers for each frame. Therefore, after the MFCCanalysis the music is converted to a sequence of 40-dimensional real vectors. Thedetails of the model follow below, and here we only seek to demonstrate our objective.Specifically, Figure 2

Figure 1: The audio waveform of the Beatles music.

shows a segmentation of the audio waveform, where the indices on the figure corre-spond to data subsequences; each subsequence is defined by a set of 60 consecutive 100ms frames. The results in Figure 2 quantify how inter-related any one subsequenceof the music is to all others. We observe that the music is decomposed into clearcontiguous segments of various lengths, and segment repetitions are evident. ThisBeatle’s song is a relatively simple example, for the piece has many distinct sections(vocals, along with clearly distinct instrumental parts). A music-theoretic analysis ofthe results in Figure 2 indicates that the segmentation correctly captures the struc-ture of the music. In the detailed results presented below, we consider much “harder”examples. Specifically, we consider classical piano music for which there are no vocals,and for which distinct instruments are not present (there is a lack of timbral variety,which makes this a more difficult challenge). We also provide a detailed examinationof the quality of the inferred music segmentation, based on music-theoretic analysis.

A typical goal of the music analysis is to segment a given piece, with the objectiveof inferring inter-relationships among motive and themes within the music. We wishto achieve this task without a priori setting the number of segments or their length,motivating a non-parametric framework. A key aspect of our proposed model is anexplicit imposition of the belief that the likelihood that two subsequences of musicare similar (contained within the same or related segments) increases as they becomemore proximate temporally.

A natural tool for segmenting or clustering data is the Dirichlet process (DP)(Ferguson,1973; Blackwell and MacQueen, 1973). In order to share statistical strength across

2

Page 3: Dynamic Nonparametric Bayesian Models for Analysis of Musiclcarin/dHDP_music.pdf · The analysis of music is of interest to music theorists, for aiding in music teaching, for analysis

sequence index

sequ

ence

inde

x

10 20 30 40 50 60 70 80

10

20

30

40

50

60

70

80

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Figure 2: Segmentation of the audio waveform in Figure 1.

different groups of data, the hierarchical Dirichlet process (HDP) (Teh et al., 2006)has been proposed to model the dependence among groups through sharing the sameset of discrete parameters (“atoms”), and the mixture weights associated with differ-ent atoms are varied as a function of the data group. In DP-based mixture models ofthis form, it is assumed that the data are generated independently and are exchange-able. In the HDP it is assumed that the data groups are exchangeable. However,in many applications data are measured in a sequential manner, and there is infor-mation in this temporal character that should ideally be exploited; this violates theaforementioned assumption of exchangeability. For example, music is typically com-posed according to a sequential organization and the long-term dependence in thetime series, known as the distance patterns in music theory, should be accounted forin an associated music model (Paiement et al., 2007; Aucouturier and Pachet, 2007).

The analysis of sequential data has been a longstanding problem in statisticalmodeling. With music as an example, Paiement (Paiement et al., 2007) proposed agenerative model for rhythms based on the distributions of distances between sub-sequences; to annotate the changes in mixed music, Plotz (Plotz et al., 2006) usedstochastic models based on the Snip-Snap approach, by evaluating the Snip modelfor the Snap window at every position within the music. However, these methodsare either based on one specific factor (rhythm) of music (Paiement et al., 2007) orneed prior knowledge of the music’s segmentation (Plotz et al., 2006). Recently, ahidden Markov model (HMM) (Rabiner, 1989) was used to model monophonic musicby assuming all the subsequences are drawn i.i.d. from one HMM (Raphael, 1999);alternatively, an HMM mixture (Qi et al., 2007) was applied to model the variabletime-evolving properties of music, within a semi-parametric Bayesian setting. In bothof these HMM music models the music was divided into subsequences, with an HMMemployed to represent each subsequence; such an approach does not account for theexpected statistical relationships between temporally proximate subsequences. By

3

Page 4: Dynamic Nonparametric Bayesian Models for Analysis of Musiclcarin/dHDP_music.pdf · The analysis of music is of interest to music theorists, for aiding in music teaching, for analysis

considering one piece of music as a whole (avoiding subsequences), an infinite HMM(iHMM) (Teh et al., 2006; Ni et al., 2007) was proposed to automatically learn themodel structure with countably infinite states. While the iHMM is an attractivemodel, it has limitations for the music modeling and segmentation of interest here,with this discussed further below.

Developing explicit temporally dependent models has recently been the focus ofsignificant interest. A related work is the dynamic topic model (Blei and Lafferty,2006; Wei et al., 2007), in which the model parameter at the previous time t −1 is the expectation for the distribution of the parameter at the next time t, andthe correlation of the samples at adjacent times is controlled through adjusting thevariance of the conditional distribution. Unfortunately, the non-conjugate form ofthe conditional distribution requires approximations in the model inference.

Recently Dunson (Dunson, 2006) proposed a Bayesian dynamic model to learnthe latent trait distribution through a mixture of DPs, in which the latent variabledensity can change dynamically in location and shape across levels of a predictor.Although this dynamic mixture of DPs has a full Bayesian solution, it has the draw-back that mixture components can only be added over time, so one ends up withmore components at later times. However, of interest for the application consideredhere, music has the property that characteristics of a given piece may repeat overtime, which implies the possible repetitive use of the same mixture component withtime. Based on this consideration, a similar dynamic structure as in (Dunson, 2006)is considered here to extend the hierarchical Dirichlet process (HDP) to incorporatetime dependence.

The remainder of the paper is organized as follows. A brief summary of DP andHDP is provided in Section 2.1. The proposed dynamic model structure is describedin Section 2.2, with associated properties discussed in Section 2.3. Model inference isdescribed in Section 2.4. Two detailed experimental results are provided in Section3, followed by conclusions in Section 4.

2. Dynamic Hierarchical Dirichlet Processes

2.1 Background

As indicated in the Introduction, a given piece of music is mapped to a sequenceof 40-dimensional real vectors via MFCC feature extraction. The MFCCs are themost widely employed features for processing audio signals, particularly in speechprocessing. To simplify the HMM mixture models employed here, each 40-dimensionalreal vector is quantized via vector quantization (VQ) (Gersho and Gray, 1992), andhere the codebook is of dimension M = 16. For example, after VQ, the continuouswaveform in Figure 1 is mapped to the sequence of codes depicted in Figure 3; it isa sequence of this type that we wish to analyze.

The standard tool for analysis of sequential data is the hidden Markov model(HMM) (Rabiner, 1989). For the discrete sequence of interest, given an observation

4

Page 5: Dynamic Nonparametric Bayesian Models for Analysis of Musiclcarin/dHDP_music.pdf · The analysis of music is of interest to music theorists, for aiding in music teaching, for analysis

0 1000 2000 3000 4000 5000 60000

5

10

15

20

Observation indexob

serv

atio

n va

lue

Figure 3: Sequence of code indices for the waveform in Figure 1, using a codebook ofdimension M = 16.

sequence x = xtTt=1 with xt ∈ 1, . . . , M, the corresponding hidden state sequence

is S = stTt=1, from which st ∈ 1, . . . , I. An HMM is represented by parameters

θ = A,B, π, defined as

• A = aρξ, aρξ = Pr(st+1 = ξ|st = ρ): state transition probability;• B = bρm, bρm = Pr(xt = m|st = ρ): emission probability;• π = πρ, πρ = Pr(s1 = ρ): initial state distribution.

To model the whole music piece with one HMM (Raphael, 1999), one may dividethe sequence into a series of subsequences xjJ

j=1, with xj = xjtTt=1 and xjt ∈

1, ..., M. The joint distribution of the observation subsequences given the modelparameters θ yields

p(x|θ) =J∏

j=1

∑Sj

πsj,1

T−1∏t=1

asj,t,sj,t+1

T∏t=1

bsj,t,xj,t

(1)

However, rather than employing a single HMM for a given piece, which is clearlyoverly-simplistic, we allow the music dynamics to vary with time by letting

xj ∼ F (θj), j = 1, . . . , J, (2)

which denotes that the subsequence xj is drawn from an HMM with parameters θj.In order to accommodate dependence across the subsequences, we can potentially letθj ∼ G, with G ∼ DP (α0G0), where G0 is a base probability measure and α0 is anon-negative real number (Ferguson, 1973). Sethuraman (Sethuraman, 1994) showedthat

G =∞∑

k=1

pkδθ∗k , pk = pk

k−1∏i=1

(1− pi) (3)

where θ∗k∞k=1 represent a set of atoms drawn i.i.d. from G0 and pk∞k=1 represent aset of weights, with the constraint

∑∞k=1 pk = 1; each pk is drawn i.i.d. from the beta

5

Page 6: Dynamic Nonparametric Bayesian Models for Analysis of Musiclcarin/dHDP_music.pdf · The analysis of music is of interest to music theorists, for aiding in music teaching, for analysis

distribution Be(1, α0). Since in practice the pk∞k=1 diminish quickly with increasingk (for reasonable choices of α0), a truncated stick-breaking process (Ishwaran andJames, 2001) is often employed, with a large truncation level K, to approximate theinfinite stick breaking process (in this approximation pK = 1). We note that a drawG from a DP (α0G0) is discrete with probability one.

2.2 Nonparametric Bayesian Dynamic Structure

Placing a DP on the distribution of the subsequence-specific HMM parameters, θj,allows for borrowing of information across the subsequences, but does not incorporateinformation that subsequences from proximal times should be more similar. Hence,we propose a more flexible dynamic mixture model in which

θj ∼ Gj, Gj =∞∑

k=1

pjkδθ∗k , θ∗k ∼ H, (4)

where the subsequence-specific mixture distribution Gj has weights that vary with j,represented as pj. Including the same atoms for all j allows for repetition in the musicstructure across subsequences, with the varying weights allowing substantial flexibil-ity. In order to account for relative subsequence positions in a piece, we propose amodel that induces dependence between Gj−1 and Gj by accommodating dependencein the weights.

Also motivated by the problem of accommodating dependence between a sequenceof unknown distributions, Dunson (Dunson, 2006) proposed a dynamic mixture ofDirichlet processes. In particular, his approach characterized Gj as a mixture ofGj−1 and an innovation distribution, which is assigned a Dirichlet process prior. Thestructure allows for the introduction of new atoms, while also incorporating atomsfrom previous times. There are two disadvantages to this approach in the musicapplication. The first is that the atoms from early times tend to receive very smallweight at later times, which does not allow recurrence of themes and grouping ofsubsequences that are far apart. The second is that atoms are only added as timeprogresses and never removed, which implies greater complexity in the music pieceat later times.

We propose a dynamic HDP (dHDP) with the following structure:

Gj = (1− wj−1)Gj−1 + wj−1Hj−1 (5)

where G1 ∼ DP (α01G0), Hj−1 is called an innovation measure drawn from DP (α0jG0),and wj−1 ∼ Be(aw(j−1), bw(j−1)). To impose sharing of the same atoms across alltime, G0 ∼ DP (γH). The measure Gj is modified from Gj−1 by introducing a newinnovation measure Hj−1, and the random variable wj−1 controls the probability ofinnovation (i.e., it defines the mixture weights).

6

Page 7: Dynamic Nonparametric Bayesian Models for Analysis of Musiclcarin/dHDP_music.pdf · The analysis of music is of interest to music theorists, for aiding in music teaching, for analysis

A draw G0 ∼ DP (γH) may be expressed as

G0 =∞∑

k=1

βkδθ∗k (6)

and the weights are drawn β ∼ Stick(γ), where Stick(γ) corresponds to letting

βk = βk

∏k−1i=1 (1 − βi) with βk

iid∼ Be(1, γ). Since the same atoms θ∗kiid∼ H are used

for all Gj, it is also possible to share data between subsequences widely separated intime; this latter property may be of interest when the music has temporal repetition,as is typical.

The measures G1, H1, . . . , HJ−1 have their own mixture weights, yielding

G1 =∞∑

k=1

ζ1kδθ∗k , H1 =∞∑

k=1

ζ2kδθ∗k , . . . , HJ−1 =∞∑

k=1

ζJkδθ∗k

ζjind∼ DP (α0jβ), j = 1, . . . , J

(7)

where, analogous to the discussion at the end of Section 2.1, the different weightsζj = ζjk∞k=1 are independent given β since G1, H1, . . . , HJ−1 are independent givenG0.

To further develop the dynamic relationship from G1 to GJ , we extend the mixturestructure in (5) from group to group:

Gj = (1− wj−1)Gj−1 + wj−1Hj−1

=

j−1∏

l=1

(1− wl)G1 +

j−1∑

l=1

j−1∏

m=l+1

(1− wm)wlHl

= wj1G1 + wj2H1 + . . . + wjjHj−1

(8)

where w11 = 1, w0 = 1, and for j > 1 we have wjl = wl−1

∏j−1m=l(1 − wm), for

l = 1, 2, . . . , j. It can be easily verified that∑j

l=1 wjl = 1 for each j, with wjl theprior probability that parameters for subsequence j are drawn from the lth componentdistribution, where l = 1, . . . , j indexes G1, H1, . . . , Hj−1, respectively. Based on thedependent relation induced here, we have an explicit form for each pjJ

j=1 in (4):

pj =

j∑

l=1

wjlζl. (9)

If all wj = 0, all of the groups share the same mixture distribution related to G1

and the model reduces to the Dirichlet mixture model described in Section 2.1. If allwj = 1 the model instead reduces to the HDP. Therefore, the dynamic HDP is moregeneral than both DP and HDP, with each a special case. In the posterior compu-tation, we treat the w as random variables and add Beta priors on them for moreflexibility. To encourage the adjacent groups to be shared, the prior Be(wj|aw, bw)for all j = 1, . . . , J − 1 should be specified to have E(wj) < 0.5.

7

Page 8: Dynamic Nonparametric Bayesian Models for Analysis of Musiclcarin/dHDP_music.pdf · The analysis of music is of interest to music theorists, for aiding in music teaching, for analysis

2.3 Sharing Properties

To obtain insight into the dependence structure induced by the dHDP proposed inSection 2.2, this section presents some basic properties. Suppose G0 is a probabilitymeasure on (Ω,B), with Ω the sample space of θj and B(Ω) the Borel σ-algebra ofsubsets of Ω. Then for any B ∈ B(Ω)

(Gj(B)|Gj−1, wj

) D= Gj−1(B) + ∆j(B), (10)

where ∆j(B) = wj−1

Hj−1(B)−Gj−1(B)

is the random deviation from Gj−1 to Gj.

Theorem 1. Under the dHDP (8), for any B ∈ B(Ω) we have:

E4j (B)|Gj−1, wj−1, H, γ, α0j

= wj−1

H(B)−Gj−1(B)

(11)

V4j (B)|Gj−1, wj−1, H, γ, α0j

= w2

j−1

(1 + γ + α0j)H(B)(1−H(B)

)

(1 + α0j)(1 + γ)(12)

The proof of this theorem is provided in the Appendix. According to Theorem 1,given the previous mixture measure Gj−1, the expectation of the deviation from Gj−1

to Gj is controlled by wj−1. Meanwhile, the variance of the deviation is both relatedwith wj−1 and the precision parameters γ, α0j. To consider limiting cases, we observethe following:

• If wj−1 → 0, Gj = Gj−1;• If Gj−1 → H, E

(Gj(B)|Gj−1, wj−1, H, γ, α0j

)= Gj−1(B);

• If γ →∞ and α0j →∞, V(4j (B)|Gj−1, wj−1, H, γ, α0j

) → 0.

Theorem 2. The correlation coefficient of the measures between two adjacentgroups Gj−1 and Gj for j = 2, . . . , J is

Corr(Gj−1, Gj) =E

Gj(B)Gj−1(B)

− EGj(B)

E

Gj−1(B)

[V

Gj(B)

V

Gj−1(B)

]1/2

=

∑j−1l=1

wjlwj−1,l

1+α0l· (α0l + γ + 1)

[∑jl=1

w2jl

1+α0l· (α0l + γ + 1)

]1/2[ ∑j−1l=1

w2j−1,l

1+α0l· (α0l + γ + 1)

]1/2

(13)

The proof is given in the Appendix. To see how the correlation coefficient changesas function of w and α0, here we analyze the dependence between these variablesand Corr(G1, G2). Specifically, (i) in Figure 4(a) we plot the correlation coefficientCorr(G1, G2) as a function of w1, with the precision parameters γ and α0 fixedand equal to one; (ii) in Figure 4(b) we plot Corr(G1, G2) as a function of α02, withw1 = 0.5, α01 = 1 and γ = 10; (iii) in Figure 4(c) we consider the plot of Corr(G1, G2)

8

Page 9: Dynamic Nonparametric Bayesian Models for Analysis of Musiclcarin/dHDP_music.pdf · The analysis of music is of interest to music theorists, for aiding in music teaching, for analysis

1~w

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Cor

r(G

1,G2)

(a)

0 10 20 30 40 50 60 70 80 90 1000.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

α02

Co

rr(G

1,G2)

(b)

0 0.2 0.4 0.6 0.8 1

0

50

100

0

0.2

0.4

0.6

0.8

1

Cor

r(G

1,G2)

1~w

02α

(c)

Figure 4: (a) Corr(G1, G2) as a function of w1 with γ and α fixed. (b) Corr(G1, G2) as afunction of α02, with γ, α01 and w fixed. (c) Corr(G1, G2) as a function of bothw1 and α02, with the values of γ and α01 fixed.

9

Page 10: Dynamic Nonparametric Bayesian Models for Analysis of Musiclcarin/dHDP_music.pdf · The analysis of music is of interest to music theorists, for aiding in music teaching, for analysis

as a function of both the variables of w1 and α02 given fixed values of γ = 10 andα01 = 1. It is observed that smaller w induces higher correlation between adjacentgroups and larger innovation parameter α0 also makes the correlation increase sincethe variance of the deviation in (12) becomes smaller. If we assume that α0l = α,then the correlation coefficient has the simple form

Corr(Gj−1, Gj) =

∑j−1l=1 wjlwj−1,l ∑j

l=1 w2jl

1/2 ∑j−1l=1 w2

j−1,l

1/2. (14)

2.4 Posterior Computation

A modification of the block Gibbs sampler (Ishwaran and James, 2001) is proposed fordHDP inference. The posterior distribution of the model parameters is expressed asPr(θ∗, w, ζ, β,α0, γ, z, r|X), in which z and r are the two indicator vectors defined asfollows. As discussed in Section 2.1, a truncated stick-breaking process (Ishwaran andJames, 2001) is employed to reduce the computational complexity. Specific importantconditional posterior distributions used in implementing the MCMC algorithm are asfollows:

1. The update of wl, for l = 1, . . . , J − 1 from its full conditional posteriordistribution, has the simple form:

(wl| · · · ) ∼ Be

[aw +

J∑

j=l+1

δ(rj(l+1) = 1), bw +J∑

j=l+1

l∑

h=1

δ(rjh = 1)

](15)

where rjJj=1 are indicator vectors and δ(rjl = 1) denotes that θj is drawn from the

lth component distribution in (8). In (15) and in the results that follow, for simplicity,the distributions Be(awj, bwj) are set with fixed parameters awj = aw and bwj = bw

for all time samples. The function δ(·) equals 1 if (·) is true and 0 otherwise.2. The full conditional distribution of ζlk, for l = 1, . . . , J and k = 1, . . . , K,

is updated under the conjugate prior: ζlk ∼ Be[α0lβk, α0l(1 −

∑km=1 βm)

], which

is specified in (Teh et al., 2006). The likelihood function associated with each ζl

is proportional to∏K

k=1 ζ∑J

j=l δ(rjl=1,zjk=1)

lk , where zj is another indicator vector, withzjk = 1 if the subsequence xj is allocated to the kth atom (θj = θ∗k) and zjk = 0

otherwise. K represents the truncation level and ζlk = ζlk

∏k−1m=1(1 − ζlm). Then the

conditional posterior of ζlk has the form

(ζlk| · · · ) ∼ Be

[α0lβk +

J∑j=1

δ(rjl = 1, zjk = 1),

α0l(1−k∑

l=1

βl) +J∑

j=1

K∑

k′=k+1

δ(rjl = 1, zjk′ = 1)

](16)

10

Page 11: Dynamic Nonparametric Bayesian Models for Analysis of Musiclcarin/dHDP_music.pdf · The analysis of music is of interest to music theorists, for aiding in music teaching, for analysis

3. The update of the indicator vector rj, for j = 1, . . . , J , is completed by gener-ating samples from a multinomial distribution with entries

Pr(rjl = 1| · · · ) ∝ wl−1

j−1∏

m=l

(1− wm)K∏

k=1

ζlk

k−1∏q=1

(1− ζlq) · Pr(xj|θ∗k)zjk

, l = 1, ..., j

(17)with Pr(xj|θ∗k) the likelihood of subsequence j given allocation to the kth atom,θj = θ∗k. The posterior probability Pr(rjl = 1) is normalized so

∑jl=1 Pr(rjl = 1) = 1.

4. The sampling of the indicator vector zj, for j = 1, . . . , J , is also generated froma multinomial distribution with entries specified as

Pr(zjk = 1| · · · ) ∝j∏

l=1

ζlk

k−1∏

k′=1

(1− ζlk′) · Pr(xj|θ∗k)rjl

, k = 1, ..., K. (18)

Other parameters specified in the dHDP setting, including the βkKk=1 and pre-

cision parameters γ, α0, are updated using the same auxiliary methods introducedin (Teh et al., 2006; Escobar and West, 1995). The parameter form for each of the com-ponents θ∗k for k = 1, . . . , K is hidden Markov model (HMM) with θ∗k = (A∗

k,B∗k,π

∗k).

As in (Qi et al., 2007), the component parameters A∗k, B∗

k and π∗k are assumed to

be a priori independent, with the base measure having a product form with Dirichletcomponents for each of the probability vectors. The specifics on the specification areshown in the Appendix. Since the indicator vector zj, for j = 1, . . . , J , representsthe membership of sharing across all the subsequences, we use this information tosegment the music, by assuming that the subsequences possessing the same member-ship should be grouped together. In order to overcome the issue of label switchingthat exists in Gibbs sampling, we use the similarity measure E(z′z) instead of themembership z in the results. Here E(z′z) is approximated by averaging the quantityz′z from multiple iterations, and in each iteration z′jzj′ measures the sharing degree ofθj and θj′ by integrating out the index of atoms. Related clustering representations ofnon-parametric models have been considered in (Medvedovic and Sivaganesan, 2002).

3. Experimental Results

To apply the dHDP-HMM proposed in Section 2 to music data, we first completethe prior specification by choosing hyperparameter values. In particular, u, v and qin expression (23) are chosen as u = 1/I1I , v = 1/M1M and q = 1/I1I , with 1a

denoting an a × 1 vector of ones. These priors are chosen motivated by the resultsof (Ishwaran and Zarepour, 2002), which imply that each of the probability vectorsconverge to a Dirichlet process with precision parameter one as the number of elementsincreases. The Dirichlet process is appealing in favoring a few dominant elements,with the remaining probabilities close to zero. We set the truncation level for DPat K = 40. The prior for w is chosen to encourage the groups to be shared, which

11

Page 12: Dynamic Nonparametric Bayesian Models for Analysis of Musiclcarin/dHDP_music.pdf · The analysis of music is of interest to music theorists, for aiding in music teaching, for analysis

means E(wl) for l = 1, . . . , J − 1 should be smaller than 0.5; consequently we set theprior

∏J−1j=1 Be(wj; aw, bw) with aw = 1 and bw = 5. Since the precision parameters γ

and α0 control the prior distribution on the number of clusters, the hyper-parametervalues should be chosen carefully. Here we set Ga(1, 1) for γ and each componentof α0. We obtained results that were consistent for moderate changes in the hyper-parameter values. These parameter settings were not optimized, and any reasonablesettings are likely to give results very similar to those presented below.

The Gibbs sampling algorithm is used to collect samples based on the posteriorcomputation methods introduced in Section 2.4. In this context a 5,000 sampleburn-in period is employed and 20,000 collection iterations are taken. The samplingalgorithm has been tested by the diagnostic methods described in (Geweke, 1992;Raftery and Lewis, 1992) and has demonstrated rapid convergence and good mixingrates.

3.1 Application to Beethoven Sonata

We consider the first movement (“Largo - Allegro”) from the Beethoven’s Sonata No.17, Op. 31, No. 2 (the “Tempest”). The audio waveform of this piano music is shownin Figure 5.

Figure 5: Audio waveform of the first movement of Op. 31, No. 2.

The music is divided into contiguous 100 ms frames, and for each frame thequantized MFCC features are represented by one code from a codebook of size M =16. Each subsequence is of length 60 (corresponding to 6 seconds in total), and for theBeethoven piece considered here there are 85 contiguous subsequences (J = 85). Thelengths of the subsequences were carefully chosen based on consultation with a musictheorist to be short enough to capture meaningful fine scale segmentation of the piece.To represent the time dependence inferred by the model, the posterior of indicator ris plotted in Figure 6(a) to show the mixture-distribution sharing relationship acrossdifferent subsequences. Figure 6(b) shows the similarity measures E(z′z) across eachpair of subsequences, in which the higher value represents larger probability of the twocorresponding subsequences being shared; here z (see Eq. (18)) is a column vectorcontaining one at the position to be associated with the component occupied at thecurrent iteration and zeros otherwise.

For comparison, we now analyze the same music using a DP-HMM (Qi et al.,2007), HDP-HMM (Teh et al., 2006) and an iHMM (Beal et al., 2002; Teh et al., 2006).In the DP-HMM, we use the model in (3), with F (θ) corresponding to an HMM with

12

Page 13: Dynamic Nonparametric Bayesian Models for Analysis of Musiclcarin/dHDP_music.pdf · The analysis of music is of interest to music theorists, for aiding in music teaching, for analysis

Sequence Index

Inde

x of

Mix

ture

Dis

trib

utio

n

10 20 30 40 50 60 70 80

10

20

30

40

50

60

70

800

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(a)

sequence index

sequ

ence

inde

x

10 20 30 40 50 60 70 80

10

20

30

40

50

60

70

800

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(b)

Figure 6: Results of dHDP HMM modeling for the Sonata No.17. (a)The posterior distribution of indicator variable r. (b) Thesimilarity matrix E[z′z].

the same number of states as used in the dHDP; this model yields an HMM mixturemodel across the music subsequences, and the subsequence order is exchangeable.However, the long time dependence for the music’s coherence is not considered in thecomponents sharing mechanism. For the DP-HMM, we used the same specificationof the base measure, H, as in the dHDP-HMM. A Gamma prior Ga(1,1) is taken asthe hyper-prior for the precision parameter α0 in (3) and the truncation level is alsoset to 40. The DP-HMM inference was performed with MCMC sampling (Qi et al.,2007). We also consider a limiting case of the dHDP-HMM, for which all innovationweights are zero, with this referred to as an HDP-HMM, with inference performedas in the dHDP, simply with the weights removed. As formulated, the HDP-HMMyields a posterior estimate on the HMM parameters (atoms) for each subsequence,while the DP-HMM yields a posterior estimate on the HMM parameters (atoms)across all of the subsequences. Thus, the HDP-HMM yields an HMM mixture modelfor each subsequence, and the mixture atoms are shared across all subsequences; forthe DP-HMM a single HMM mixture model is learned across all subsequences.

As in Figure 6, we plot the similarity measures E(z′z) across each pair of sub-sequences for DP-HMM in Figure 7(a) and also show the same measure from HDP-HMM in Figure 7(b), in which the dynamic structure is removed from dHDP; othervariables have the same definition as inferred via the DP-HMM and HDP-HMM.Compared with the result of dHDP in Figure 6(b), we observe a clear difference:although the DP-HMM can also tell the repetitive patterns occurring before the 42th

subsequence, the HMM components shared during the whole piece jump from one tothe other between the successive subsequences, which makes it difficult to segment themusic and understand the development of the piece (e.g., the slow solo part betweenthe 53th and 69th subsequences is segmented into many small pieces in DP-HMM);

13

Page 14: Dynamic Nonparametric Bayesian Models for Analysis of Musiclcarin/dHDP_music.pdf · The analysis of music is of interest to music theorists, for aiding in music teaching, for analysis

sequence index

sequ

ence

inde

x

10 20 30 40 50 60 70 80

10

20

30

40

50

60

70

800

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(a)

sequence index

sequ

ence

inde

x

10 20 30 40 50 60 70 80

10

20

30

40

50

60

70

800

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(b)

Figure 7: Results of DP-HMM and HDP-HMM mixture modeling forthe Sonata No.17. (a) The similarity matrix E(z′z) from DP-HMM result. (b) The similarity matrix E(z′z) from HDP-HMM result.

similar performance is also achieved in the results of HDP-HMM (Figure 7(b)) andthe music’s coherence nature is not observed in such modelings.

Additionally, we also compare the dHDP HMM with segmentation results pro-duced by the iHMM (Beal et al., 2002; Teh et al., 2006). With the iHMM, the musicis treated as one long sequence (all the subsequences are concatenated together se-quentially) and a single HMM with an “infinite” set of states is inferred; in practice, afinite set of states is inferred as probable, as quantified in the state-number posterior.For the piece of music under consideration, the posterior on the number of statesacross the entire piece is as depicted in Figure 8(a). The inference was performedusing MCMC, as in (Teh et al., 2006), with hyper-parameters consistent with themodels discussed above.

With the MCMC, we have a state estimation of each observation (codeword,for our discrete-observation model). For each of the subsequences considered bythe other models, we employ the posterior on the state distribution to compute theKullback-Leibler (KL) distance between every pair of subsequences. Since the KLdistance is not symmetric, we define the distance between two state distribution as:D = 1

2E(DKL(P1||P2)) + E(DKL(P2||P1)). Based on the collected samples, we use

the averaged KL distance to measure the similarity between any two subsequencesand plot it in Figure 8(b). Although such a KL-distance matrix is a little noisy,we observe a similar time-evolving sharing existing between adjacent subsequences,as inferred by the dHDP. This is because the iHMM also characterizes the music’scoherence since all of the sequential information is contained in one HMM. However,the inference of this relationship requires a post-processing step with the iHMM, while

14

Page 15: Dynamic Nonparametric Bayesian Models for Analysis of Musiclcarin/dHDP_music.pdf · The analysis of music is of interest to music theorists, for aiding in music teaching, for analysis

4 5 6 7 8 9 10 11 120

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

state number

freq

uenc

y

(a)

sequence index

sequ

ence

inde

x

10 20 30 40 50 60 70 80

10

20

30

40

50

60

70

80

0.5

0.6

0.7

0.8

0.9

1

(b)

Figure 8: Analysis results for the piano music based on the iHMM. (a)Posterior distribution of state number. (b) Approximate sim-ilarity matrix by KL-distance.

the dHDP infers these relationships as a direct aspect of the inference, also yielding“cleaner” results.

3.2 Analysis of Musical Form

We now examine a comparison between the automated dHDP music segmentationand an analysis based on the underlying music (the latter performed by the thirdauthor). The temporal resolution is increased, of interest for comparison to musictheory; in the two examples presented below the frames are 50 ms long, each subse-quence is again of length T = 60 (3 ms subsequences), and for the Beethoven piece wenow have J = 129 contiguous frames. All other parameters are unchanged. In Figure9 it is observed that the model does a good job of segmenting the large sectionaldivisions found in sonata form: exposition, exposition repeat, development, and re-capitulation. Along the top of the figure, we note a parallel row of circles (white)and ellipses (yellow); these correspond to the “Largo” sections and are extracted well.The first two components of “Largo” (white circles) are exact repeats of the music,and this is reflected in the segmentation. Note that the yellow ellipses are still partof “Largo”, but they are slightly distinct from the yellow circles at left; this is dueto the introduction of new key areas extended by recitative passages. The row ofwhite squares correspond to the “main theme” (“Allegro”), and these are segmentedproperly. The parallel row of blue rectangles corresponds to the second key area,which abruptly changes the rhythmic and melodic texture of the music. Note thatthe first two of these (centered about approximately sequences 30 and 58) are exact

15

Page 16: Dynamic Nonparametric Bayesian Models for Analysis of Musiclcarin/dHDP_music.pdf · The analysis of music is of interest to music theorists, for aiding in music teaching, for analysis

repeats. The third appearance of this passage is in a different key, which is supportedby the graph showing slightly lower similarity.

The row of three circles parallel to approximately sequence 30 corresponds toanother sudden change in texture characterized by melodic neighbor tone motionemphasizing Neapolitan harmony (A-natural moving to Bb), followed by a harmonicsequence. The rightmost circle, in the recapitulation, is in a different key and conse-quently emphasizes neighbor motion on D-natural and Eb, and is still found similarto the earlier two appearances.

We also note that the A-natural / Bb neighbor motion is similar to subsequencesnear subsequence 20, and this may be because subsequence 20 also has strong neighbortone motion (E-natural to F-natural) in the left hand accompaniment.

Finally, the bottom-right circle in Figure 9 identifies unique material that replacesthe recapitulation of the main theme (“Allegro”), and its similarity to the main theme(around sequence 16) moves lower. The arrows at the bottom of Figure 19 identify“Allegro” interjections in the Largo passages, not all of which are in the same key.

As a second classical example, we consider Mozart-K. 333 Movement 1 (sampled inthe same manner, yielding for this case J = 139 subsequences); this is again entirelya piano piece. The comparisons to the other algorithms are as discussed above forthe Beethoven piece, and therefore here we focus on analyzing the quality of the seg-mentations. The results are presented in Figure 10, for which a segmentation of largesectional divisions (exposition, exposition repeat, development, and recapitulation) isrendered well.

Considering the annotations in Figure 10, the vertical arrow at the bottom iden-tify unaccompanied melodic transitions in the right hand or sudden changes to softdynamics, which are generally distinguished by the dHDP. The first row of whitecircles (near the top) correspond to the beginning of the second theme, characterizedby the distinctive chordal gesture in the key of the dominant, and this decompositionor relationship appears to be accurate. We note that the third appearance of thisgesture in the recapitulation is in a different key, and the similarity is correspond-ingly lower. An example of an “error” is manifested in the row of white rectangles.These correspond to the closing theme, and the left two rectangles (high correlationbetween each) are correct, but the right rectangle does not have a corresponding highcorrelation inside; it is therefore not recognized in the recap, when it appears in adifferent key (tonic). The results in Figure 10 show a repeated high degree of similar-ity that is characteristic of Mozart piano sonatas; the consistent musical structure isoccasionally permeated by exquisite details, such as a phase transition (these, again,identified by the arrows at the bottom).

Summarizing the results in Figures 9 and 10, we believe the segmentation resultsare promising (particularly based on the careful analysis of the third author, whichincludes a level of musical detail that is deemed beyond this scope of this paper).While there are mistakes, there is a great deal of accuracy as well. Figures 9 and 10reveal something characteristic about Beethoven and Mozart. Whereas Beethoven’s

16

Page 17: Dynamic Nonparametric Bayesian Models for Analysis of Musiclcarin/dHDP_music.pdf · The analysis of music is of interest to music theorists, for aiding in music teaching, for analysis

subsequence index

su

bse

qu

en

ce

in

de

x

20 40 60 80 100 120

20

40

60

80

100

120

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Exposition

Exposition

Repeat Development Recapitulation

Exact

Repeat

Exact

Repeat

Figure 9: Annotated E(z′z) for the Beethoven Sonata. The description of the an-notations are provided in the text. The numbers along the vertical andhorizontal axes correspond to the sequence index, and the color bar quan-tifies the similarity between any two segments.

music has more emphatic gestures and dramatic contrasts (evident in Figure 9),Mozart’s work tends to have a more consistent musical surface (reflected by the highdegree of overall similarity in Figure 10). This is completely in keeping with theconventional understandings of the two composers. Nonetheless, the graph of theMozart piece very frequently isolates exquisite details that emerge from the consistentsurface. Again, this is typical of Mozart’s music, which gains its power by serenelymoving from elegance to pathos and back without calling attention to it. By contrast,Beethoven will almost always dramatize such moments.

The Mozart results suggest that the algorithm recognizes prominent pitches andchords. Two passages that are melodically dissimilar but emphasize the same notesor chords are more likely to be linked by the dHDP than two passages which have

17

Page 18: Dynamic Nonparametric Bayesian Models for Analysis of Musiclcarin/dHDP_music.pdf · The analysis of music is of interest to music theorists, for aiding in music teaching, for analysis

subsequence index

su

bse

qu

en

ce

in

de

x

20 40 60 80 100 120

20

40

60

80

100

120

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Exposition Exposition Repeat

Development

Recapitulation

Figure 10: Annotated E(z′z) for the Mozart. The description of the annotationsare provided in the text. The numbers along the vertical and horizontalaxes correspond to the sequence index, and the color bar quantifies thesimilarity between any two segments.

exactly the same melody but are in different keys. This may account for some of themore conspicuous errors, but it also offers intriguing links between other passages.Finally, one should not discount the role the performers play. Different interpretationsof the same piece may segment quite differently.

4. Conclusions

The dynamic hierarchical Dirichlet process (dHDP) has been developed for analysisof sequential data, with a focus on analysis of audio data from music. The frameworkassumes a parametric representation F (θ) to characterize the statistics of the dataobserved at a single point in time. The parameters θ associated with a given point in

18

Page 19: Dynamic Nonparametric Bayesian Models for Analysis of Musiclcarin/dHDP_music.pdf · The analysis of music is of interest to music theorists, for aiding in music teaching, for analysis

time are assumed to be drawn from a mixture model, with in general an infinite num-ber of atoms, analogous to the Dirichlet process. The mixture models at time t−1 andtime t are inter-related statistically. The model is linked to the hierarchical Dirichletprocess (Teh et al., 2006) in the sense that the initial mixture model and the subse-quent time-dependent mixtures are drawn from the same discrete distribution. Thisimplies that the underlying atoms in the θ space associated with the aforementionedmixtures are the same, and what is changing with time are the mixture weights. Themodel has the following characteristics: (i) with inferred probabilities, the underlyingparameters associated with data at adjacent times are the same; and (ii) since thesame underlying atoms are used in the mixtures at all times, it is possible that thesame atoms may be used at temporally distant time, allowing the capture of repeatedpatterns in temporal data. The underlying sharing properties (correlations) betweenobservations at adjacent times have also been derived. Inference has been performedin an MCMC setting.

Examples have been presented on three musical pieces: a relatively simple piecefrom the Beatles, as well as two more complicated classical pieces. The classical piecesare more difficult to analyze because there are no vocals, and a single instrument isgenerally used, and therefore the segmentation of such data is more subtle. Theresults of the classical-piece segmentations have been analyzed for their connection tomusic analysis. In this connection it is felt that the results are promising. While therewere mistakes in the analysis of the Beethoven and Mozart pieces considered, there isa great deal of accuracy as well. The results clearly reveal meaningful characteristicsabout Beethoven and Mozart.

Concerning future research, for large data sets the MCMC inference engine em-ployed here may not be computationally tractable. The graphical form of the dHDPis applicable to more-approximate inference engines, such as a variational Bayesian(VB) analysis (Blei and Jordan, 2004). We intend to examine VB inference in futurestudies, and to examine its relative advantages in computational efficiency comparedto its inference accuracy (relative to MCMC).

19

Page 20: Dynamic Nonparametric Bayesian Models for Analysis of Musiclcarin/dHDP_music.pdf · The analysis of music is of interest to music theorists, for aiding in music teaching, for analysis

Appendix

A. Proof of Theorem 1.

V4j (B)|Gj−1, wj−1, H, γ, α0j

=w2j−1 · V

Hj−1(B)

=w2j−1

[E

V

(Hj−1(B)|G0(B)

)+ V

E

(Hj−1(B)|G0(B)

)]

=w2j−1

[E

G0(B)(1−G0(B)

)

1 + α0j

+ V

G0(B)

]

=w2j−1

[ 1

1 + α0j

E(G0(B)−G2

0(B))

+H(B)

(1−H(B)

)

1 + γ

]

=w2j−1

[1

1 + α0j

H(B)− E

(G2

0(B))

+H(B)

(1−H(B)

)

1 + γ

]

=w2j−1

[1

1 + α0j

H(B)−

(H(B)(1−H(B)

)

1 + γ+ H(B)2

)+

H(B)(1−H(B)

)

1 + γ

]

=w2j−1

[(1 + γ + α0j)H(B)

(1−H(B)

)

(1 + γ)(1 + α0j)

]

B. Proof of Theorem 2.According to (8), Gj = wj1G1 +

∑j−1l=1 wjlHl, where wjl = wl−1

∏j−1m=l(1 − wm).

Then given wj1, . . . , wjj and the base distribution H, the expectation of Gj is:

EGj(B)

= wj1E

G1(B)

+

j−1∑

l=1

wjlEHl(B)

=

j∑

l=1

wjlH(B)

(19)

Since given G0, the variance of Gj(B) is: VGj(B)

=

∑jl=1(

w2jl

α0l+1)G0(B)

1 −

G0(B). Then we can get the variance of Gj(B) with the expectation of G0(B)

20

Page 21: Dynamic Nonparametric Bayesian Models for Analysis of Musiclcarin/dHDP_music.pdf · The analysis of music is of interest to music theorists, for aiding in music teaching, for analysis

as follows:

VGj(B)

= E

[V

(Gj(B)|G0(B)

)]+ V

[E

(Gj(B)|G0(B)

)]

= E

[ j∑

l=1

( w2jl

α0l + 1

)G0(B)

(1−G0(B)

)]+ V

[ j∑

l=1

wjlG0(B)

]

=

j∑

l=1

w2jl

α0l + 1E

[G0(B)−G2

0(B)]

+ V

[ j∑

l=1

wjlG0(B)

]

=

j∑

l=1

w2jl

α0l + 1

[H(B)− (

V (G0(B)) + H2(B))]

+

j∑

l=1

w2jlV

[G0(B)

]

=

j∑

l=1

w2jl

[(1− 1

1 + α0l

)V

[G0(B)

]+

H(B)[1−H(B)

]

1 + α0l

]

=

j∑

l=1

w2jl

[ α0l

1 + α0l

· 1

1 + γH(B)

[1−H(B)

]+

H(B)[1−H(B)

]

1 + α0l

]

=

j∑

l=1

w2jl

1 + α0l

(α0l + γ + 1

1 + γ

)H(B)

[1−H(B)

]

(20)

Additionally given G0 we can get:

EGj(B)Gj−1(B) − EGj(B)EGj−1(B)=E

[wj1G1(B) + . . . + wjjHj−1(B)wj−1,1G1(B) + . . . + wj−1,j−1Hj−2(B)

]

− Ewj1G1(B) + . . . + wjjHj−1(B)Ewj−1,1G1(B) + . . . + wj−1,j−1Hj−2(B)

=wj1wj−1,1V G1(B)+

j−1∑

l=2

wjlwj−1,lV Hl−1(B)

=

j−1∑

l=1

wjlwj−1,l

1 + α0l

· α0l + γ + 1

1 + γH(B)

[1−H(B)

]

(21)

From the above analysis, the correlation coefficient of the distributions between theadjacent groups defined in (13) can be formularized as follows:

Corr(Gj−1(B), Gj(B)) =

∑j−1l=1

wjlwj−1,l

1+α0l· (α0l + γ + 1)

[ ∑jl=1

w2jl

1+α0l· (α0l + γ + 1)

]1/2[ ∑j−1l=1

w2j−1,l

1+α0l· (α0l + γ + 1)

]1/2

(22)C. HMM parameters update.

21

Page 22: Dynamic Nonparametric Bayesian Models for Analysis of Musiclcarin/dHDP_music.pdf · The analysis of music is of interest to music theorists, for aiding in music teaching, for analysis

Dirichlet distributions are used as the conjugate priors for these parameters (Qiet al., 2007):

Pr(A∗k|u) =

I∏ρ=1

Dir(aρ1, . . . , aρI;u

)

Pr(B∗k|v) =

I∏ρ=1

Dir(bρ1, . . . , bρM;v

)(23)

Pr(π∗k|q) = Dir

(π1, . . . , πI;q

)

where I is the number of states for the HMM and M is the codebook size of theobservations. To update the parameters of HMMs, we also sample the hidden states.The posterior updating equations are as follows:• Pr(A∗

k|A∗−k,B

∗,π∗,w, ζ,β,α0, s, z, r,x)

=∏I

ρ=1 Dir(aρ1, . . . , aρI; u1 + uk,ρ1, . . . , uI + uk,ρI

)

where uk,ρξ =∑J

j=1:zjk=1

∑T−1t=1 δ(sj,k,t = ρ, sj,k,t+1 = ξ).

• Pr(B∗k|A∗,B∗

−k,π∗,w, ζ,β, α0, s, z, r,x)

=∏I

ρ=1 Dir(bρ1, . . . , bρM; v1 + vk,ρ1, . . . , vM + vk,ρM

)

where vk,ρm =∑J

j=1:zjk=1

∑Tt=1 δ(sj,k,t = ρ, xjt = m).

• Pr(π∗k|A∗, B∗,π∗

−k,w, ζ, β,α0, s, z, r,x)

= Dir(π1, . . . , πI; q1 + qk,1, . . . , qI + qk,I

)where qk,ρ =

∑Jj=1:zjk=1 δ(sj,k,1 = ρ).

• Pr(sj,k,t|s−j, sj,k,−t,A∗,B∗,π∗,w, ζ,β,α0, z, r,x)

Pr(sj,k,1)Pr(sj,k,2|sj,k,1)Pr(xj1|sj,k,1) for t = 1Pr(sj,k,t|sj,k,t−1)Pr(sj,k,t+1|sj,k,t)Pr(xjt|sj,k,t) for 2 ≤ t ≤ T − 1Pr(sj,k,T |sj,k,T−1)Pr(xjT |sj,k,T ) for t = T

References

J. J. Aucouturier and F. Pachet. The influence of polyphony on the dynamical mod-elling of musical timbre. Pattern Recognition Letters, 28(5):654–661, 2007.

M.J. Beal, Z. Ghahramani, and C.E. Rasmussen. The infinite hidden markov model.In Neural Information Processing Systems(NIPS), 2002.

D. Blackwell and J.B. MacQueen. Ferguson distributions via polya urn schemes. Ann.Statist., 1(2):353–355, 1973.

D.M. Blei and M.I. Jordan. Variational methods for the dirichlet process. In Pro-ceedings of the International Conference on Machine Learning, 2004.

22

Page 23: Dynamic Nonparametric Bayesian Models for Analysis of Musiclcarin/dHDP_music.pdf · The analysis of music is of interest to music theorists, for aiding in music teaching, for analysis

D.M. Blei and J.D. Lafferty. Dynamic topic models. In Proceedings of the Interna-tional Conference on Machine Learning, 2006.

D.B. Dunson. Bayesian dynamic modeling of latent trait distributions. Biostatistics,7(4):551–568, 2006.

M.D. Escobar and M. West. Bayesian density estimation and inference using mixtures.Journal of the American Statistical Association, 90(430):577–588, 1995.

T.S. Ferguson. A bayesian analysis of some nonparametric problems. Annals ofStatistics, 1:209–230, 1973.

E.B. Fox, E.B. Sudderth, M.I. Jordan, and A.S. Willsky. An hdp-hmm for systemswith state persistence. Proc. 25th International Conference on Machine Learning(ICML), 2008.

A. Gersho and R.M. Gray. Vector Quantization and Signal Compression. Springer,1992.

J. Geweke. Evaluating the accuracy of sampling-based approaches to the calculationof posterior moments. Bayesian Stat., 4:169–193, 1992.

H. Ishwaran and L.F. James. Gibbs sampling methods for stick-breaking priors.Journal of the American Statistical Association, 96(453):161–173, 2001.

H. Ishwaran and M. Zarepour. Exact and approximate representations for the sumdirichlet process. Canadian Journal of Statistics, 30(2):269–283, 2002.

J.H. Jensen, M.G. Christensen, M.N. Murthi, and S.H. Jensen. Evaluation of mfccestimation techniques for music similarity. In Proceedings of the 14th EuropeanSignal Processing, 2006.

B. Logan. Mel frequency cepstral coefficients for music modeling. In InternationalSymposium on Music Information Retrieval, 2000.

M. Medvedovic and S. Sivaganesan. Bayesian infinite mixture model based clusteringof gene expression profiles. Bioinformatics, 18(9):1194–1206, 2002.

K. Ni, D. Dunson, and L. Carin. Multi-task learning for sequential data via ihmmsand the nested dirichlet process. Proceedings of the 24th International Conferenceon Machine Learning, 2007.

K. Ni, J. Paisley, L. Carin, and D. Dunson. Multi-task learning for analyzing and sort-ing large databases of sequential data. Accepted by IEEE Trans. Signal Processingand available at http://www.ece.duke.edu/ lcarin/Papers.html, 2008.

23

Page 24: Dynamic Nonparametric Bayesian Models for Analysis of Musiclcarin/dHDP_music.pdf · The analysis of music is of interest to music theorists, for aiding in music teaching, for analysis

J.-F. Paiement, Y. Grandvalet, S. Bengio, and D. Eck. A generative model forrhythms. NIPS’2007 Music, Brain & Cognition Workshop, 2007.

T. Plotz, G.A. Fink, P. Husemann, S. Kanies, K. Lienemann, T. Marschall, M. Martin,L. Schillingmann, M. Steinrucken, and H. Sudek. Automatic detection of songchanges in music mixes using stochastic models. 18th International Conference onPattern Recognition (ICPR’06), 2006.

Y. Qi, J.W. Paisley, and L. Carin. Music analysis using hidden markov mixturemodels. IEEE Transactions on Signal Processing, 55(11):5209–5224, 2007.

L.R. Rabiner. A tutorial on hidden markov models and selected applications in speechrecognition. Proceedings of the IEEE, 77(2):257–286, 1989.

A.E. Raftery and S. Lewis. How many iterations in the gibbs sampler? BayesianStat., 4:763–773, 1992.

C. Raphael. Automatic segmentation of acoustic musical signals using hidden markovmodels. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(4):360–370, 1999.

J. Sethuraman. A constructive definition of dirichlet priors. Statistica Sinica, 2:639–650, 1994.

Y.W. Teh, M.I. Jordan, M.J. Beal, and D.M. Blei. Hierarchical dirichlet processes.JASA, 101(476):1566–1581, 2006.

D. Temperley. Music and Probability. MIT Press, 2007.

D. Temperley. A probabilistic model of melody perception. Cognitive Science, 32:418–444, 2008.

X. Wei, J. Sun, and X. Wang. Dynamic mixture models for multiple time series. InProceedings of the International Joint Conference on Artificial Intelligence, 2007.

24