analysis of motifs in carnatic music: a computational...

Analysis of Motifs in Carnatic Music: AComputational Perspective

A THESIS

submitted by

SHREY DUTTA

for the award of the degree

of

MASTER OF SCIENCE(by Research)

DEPARTMENT OF COMPUTER SCIENCE ANDENGINEERING

INDIAN INSTITUTE OF TECHNOLOGY, MADRAS.October 2015

THESIS CERTIFICATE

This is to certify that the thesis entitled Analysis of Motifs in Carnatic Music:

A Computational Perspective, submitted by Shrey Dutta, to the Indian Institute

of Technology, Madras, for the award of the degree of Master of Science (by

Research), is a bona fide record of the research work carried out by him under my

supervision. The contents of this thesis, in full or in parts, have not been submitted

to any other Institute or University for the award of any degree or diploma.

Dr. Hema A. MurthyResearch GuideProfessorDept. of Computer Science and EngineeringIIT-Madras, 600 036

Place: Chennai

Date:

ACKNOWLEDGEMENTS

I joined IIT Madras with the intention of mastering the techniques used in machine

learning. There is so much data available in digital form and I used to think that

machine learning techniques help in making sense of this data just as human brain

makes sense of the raw data received from different senses. As I started gaining

deep understanding in machine learning techniques, I realized that these tech-

niques are not mature enough to mimic the human brain and thus, should not be

used blindly. I understood that the data needs to be represented in a sensible form

which depends on the task under consideration. These techniques are designed to

use this representation in achieving the desired task. After understanding this, I

was able to use the existing techniques efficiently as well as design new techniques

when required. This level of understanding was not possible without the immense

knowledge and experience shared by my adviser, Prof. Hema A. Murthy, through

endless captivating discussions.

I would like to express my sincere gratitude to her for the excellent guidance,

patience and providing me with an excellent atmosphere for doing research. She

helped me to develop my background in signal processing and machine learning

and to experience the practical issues beyond the textbooks. She has not only

helped in improving my perspective towards research but also towards life.

I would like to thank my collaborators Vignesh Ishwar, Krishnaraj Sekhar

and Ashwin Bellur. The completion of this thesis would not have been possible

without their contribution. They helped me in building datasets, carrying out the

i

experiments, analyzing results and in writing research papers.

I am grateful to the members of my General Test Committee, Prof. C. Chandra

Sekhar and Prof. C. S. Ramalingam, for their suggestions and criticisms with

respect to the presentation of my work. I am also grateful for being a part of the

CompMusic project. It was a great learning experience working with the members

of this consortium.

I would like to thank my music teachers Prof. M.V.N. Murthy and Niveditha

Bharath. Prof. M.V.N. Murthy patiently taught me to play the instrument,

Saraswati Veena, in his unique and excellent style. He always encouraged me

to explore the music beyond what he used to teach in classes which certainly

manifested my creativity. Madam Nivedita Bharath taught me to sing Carnatic

music. She is an excellent and a very friendly teacher. Her classes were full of fun

and excitement. Learning music from these wonderful teachers also helped me to

better understand the work with respect to this thesis.

I would like to thank Aashish, Anusha, Asha, Jom, Karthik, Manish, Padma,

Praveen, Raghav, Rajeev, Sarala, Saranya, Sridharan, Srikanth and other members

of Donlab for their help and unconditional support over the years. It would have

been a lonely lab without them. I am also grateful to Alastair, Ajay and Sankalp

from MTG Barcelona for always clearing my doubts and helping in my research. I

would also like to acknowledge the help of Kaustuv from IIT Bombay. He always

found time to answer my questions regarding Hindustani music.

I am also obliged to the European Research Council for funding the research un-

der the European Unions Seventh Framework Program, as part of the CompMusic

project (ERC grant agreement 267583).

I would like to thank all my friends at IIT Madras without whom the life at IIT

ii

campus would have been dry and boring. If not for them, I would have finished

my thesis much earlier. They have always been a source of refreshment during

stressful times.

I would like to thank my parents who have made many sacrifices so that I can

get a good education and a good life. They have always tolerated my stubborn

and rebellious nature which I am constantly trying to change. I wish to make them

proud one day.

Lastly, I would like to thank my loving brother Anubhav for always being

an anchor of my life. It was he who has taken the responsibility of financially

supporting our family at an early age and motivated me to pursue any path I wish

to choose. I will always be grateful to him and I wish him all the happiness in life.

iii

ABSTRACT

KEYWORDS: Carnatic Music, Pattern Discovery, Motif Spotting, Motif Dis-

covery, Raga Verification, Stationary Points, Rough Longest

Common Subsquence, Longest Common Segment Set

In Carnatic music, a collective expression of melodies that consists of svaras

(ornamented notes) in a well defined order and phrases (aesthetic threads of or-

namented notes) that have been formed through the ages defines a raga. Melodic

motifs are those unique phrases of a raga that collectively give a raga its identity.

These motifs are rendered repeatedly in every rendition of the raga, either compo-

sitional or improvisational, so that the identity of a raga is established. Different

renditions of a motif makes it challenging for a time-series matching algorithm to

match them as they differ slightly from each other. In this thesis, we design al-

gorithmic techniques to automatically find these motifs, their different renditions

and, then use the regions rich in these motifs to perform raga verification.

The initial focus of the thesis is on finding different renditions of melodic

motifs in an improvisational form of the raga called the alapana. Then we make

an attempt to automatically discover these motifs from the composition lines. The

results suggest that composition lines are indeed replete with melodic motifs.

Using these composition lines, raga verification is performed. In raga verification,

a melody (a single phrase or an aesthetic concatenation of many such phrases)

along with a raga claim is supplied to the system. The system confirms or rejects

the claim.

iv

Two algorithms for time-series matching are proposed in this work. One is

a modification of the existing algorithm, Rough Longest Common Subsequence

(RLCS). Another proposed algorithm, Longest Common Segment Set (LCSS), is

completely novel and uses in between matched segments to give a holistic score.

Using the proposed algorithm LCSS, an error rate of ∼ 12% is obtained for raga

verification on a database consisting of 17 ragas.

v

TABLE OF CONTENTS

ACKNOWLEDGEMENTS i

ABSTRACT iv

LIST OF TABLES x

LIST OF FIGURES xi

ABBREVIATIONS xii

NOTATION xiii

1 Introduction 1

1.1 Overview of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Contribution of the thesis . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Organization of the thesis . . . . . . . . . . . . . . . . . . . . . . . 3

2 Literature Survey 5

3 Motif Spotting 20

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2 Stationary Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.2.1 Method of obtaining Stationary Points . . . . . . . . . . . . 23

3.3 Rough Longest Common Subsequence Algorithm . . . . . . . . . 25

3.3.1 Rough match . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.3.2 WAR and WAQ for local similarity . . . . . . . . . . . . . . 26

3.3.3 Score matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.4 Modified-Rough Longest Common Subsequence . . . . . . . . . . 27

3.4.1 Rough and actual length of RLCS . . . . . . . . . . . . . . 28

vi

3.4.2 RWAR and RWAQ . . . . . . . . . . . . . . . . . . . . . . . 28

3.4.3 Matched rate on the query sequence . . . . . . . . . . . . . 30

3.5 A Two-Pass Dynamic Programming Search . . . . . . . . . . . . . 30

3.5.1 First Pass: Determining Candidate Motif Regions using RLCS 31

3.5.2 Second Pass: Determining Motifs from the Groups . . . . 32

3.6 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.7 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . 33

3.7.1 Querying motifs in the alapanas . . . . . . . . . . . . . . . . 33

3.7.2 Comparison between RLCS and Modified-RLCS using longermotifs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.8.1 Importance of VAD in motif spotting . . . . . . . . . . . . 39

3.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4 Motif Discovery 41

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.2 Lines from the compositions . . . . . . . . . . . . . . . . . . . . . . 44

4.3 Optimization criteria to find Rough Longest Common Subsequence 44

4.3.1 Density of the match . . . . . . . . . . . . . . . . . . . . . . 45

4.3.2 Normalized weighted length . . . . . . . . . . . . . . . . . 46

4.3.3 Linear trend in stationary points . . . . . . . . . . . . . . . 46

4.4 Discovering typical motifs of ragas . . . . . . . . . . . . . . . . . . 49

4.4.1 Filtering to get typical motifs of a raga . . . . . . . . . . . . 49

4.5 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.6 Experiments and results . . . . . . . . . . . . . . . . . . . . . . . . 52

4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5 Raga Verification 56

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.2 Dataset used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.2.1 Extraction of pallavi lines . . . . . . . . . . . . . . . . . . . . 58

vii

5.2.2 Selection of cohorts . . . . . . . . . . . . . . . . . . . . . . . 58

5.3 Longest Common Segment Set Algorithm . . . . . . . . . . . . . . 59

5.3.1 Common segments . . . . . . . . . . . . . . . . . . . . . . . 60

5.3.2 Common segment set . . . . . . . . . . . . . . . . . . . . . 62

5.3.3 Longest Common Segment Set . . . . . . . . . . . . . . . . 62

5.4 Raga Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.4.1 Score Normalization . . . . . . . . . . . . . . . . . . . . . . 65

5.5 Performance evaluation . . . . . . . . . . . . . . . . . . . . . . . . 66

5.5.1 Experimental configuration . . . . . . . . . . . . . . . . . . 66

5.5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.6.1 Combining hard-LCSS and soft-LCSS . . . . . . . . . . . . 69

5.6.2 Reduction of overlap in score distribution by T-norm . . . 69

5.6.3 Scalability of raga verification . . . . . . . . . . . . . . . . . 70

5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

6 Conclusion 71

6.1 Salient Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

6.2 Criticism of the work . . . . . . . . . . . . . . . . . . . . . . . . . . 72

6.3 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

LIST OF TABLES

2.1 Svaras and their respective ratios to the base pitch ‘S’. . . . . . . . 6

3.1 Dataset of alapanas . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.2 Short Motifs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.3 Long Motifs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.4 Short Motifs: Retrieved regions after the first pass . . . . . . . . . 34

3.5 Long Motifs: Retrieved regions after the first pass . . . . . . . . . 35

3.6 Short Motifs: Top 10 retrieved motifs after the second pass . . . . 35

3.7 Long Motifs: Top 10 retrieved motifs after the second pass . . . . 35

3.8 Long Motifs: Retrieved regions after the first pass . . . . . . . . . 37

3.9 Long Motifs: Retrieved regions after the second pass . . . . . . . 38

3.10 Retrieved Groups after both the passes for modified-RLCS withoutVAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.1 D1: Dataset of composition lines . . . . . . . . . . . . . . . . . . . 50

4.2 D1: Dataset for filtering . . . . . . . . . . . . . . . . . . . . . . . . 50

4.3 D2: Dataset of composition lines . . . . . . . . . . . . . . . . . . . 51

4.4 D2: Dataset for filtering . . . . . . . . . . . . . . . . . . . . . . . . 51

4.5 D1:Similar motifs retrieved from composition lines . . . . . . . . . 52

4.6 D1:Percentage of motifs preserved after filtering . . . . . . . . . . 53

4.7 D2:Similar motifs retrieved from composition lines . . . . . . . . . 53

4.8 D2:Percentage of motifs preserved after filtering . . . . . . . . . . 54

5.1 Details of the database used. Durations are given in approximatehours (h), minutes (m) or seconds (s). . . . . . . . . . . . . . . . . 58

5.2 EER(%) for different algorithms using different normalizations ondifferent datasets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

ix

5.3 Number of claims correctly verified by hard-LCSS only, by soft-LCSS only, by both and by neither of them for D1 and D2 usingT-norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

x

LIST OF FIGURES

2.1 Comparing Pitch Histogram of Raga ‘Sankarabharanam’ with itsHindustani and Western classical counterparts. . . . . . . . . . . . 7

2.2 Comparing a phrase in raga Sankarabaranam with gamakas and with-out gamakas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3 The gamakas in their true form are marked in a pitch contour of amelody . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.4 Tonic normalization of two similar phrases in raga ‘sankarabha-ranam’ rendered at different tonics. . . . . . . . . . . . . . . . . . . 10

2.5 Different renditions of a melodic motif in raga ‘Kalyani’ and raga‘Kamboji’. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.6 Different instances of a melodic motif in an alapana marked in red. 12

2.7 Extraction of stationary points and their interpolation to get a smoothpitch contour. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.1 A Phrase with Stationary Points . . . . . . . . . . . . . . . . . . . . 22

3.2 The Pitch and Stationary Point Histograms of the raga Kamboji . 23

3.3 Original and Cubic Interpolated pitch contours . . . . . . . . . . . 24

3.4 a) True positive groups’ and false alarm groups’ score distributionfor RLCS. b) True positive groups’ and false alarm groups’ scoredistribution for modified-RLCS. . . . . . . . . . . . . . . . . . . . 37

4.1 RLCS matching two sequences partially . . . . . . . . . . . . . . . 42

4.2 Slopes of the linear trend of stationary points helps in reducing thefalse alarms. The last three phrases are false alarms. . . . . . . . . 47

5.1 An example of a common segment set between two sequences rep-resenting the real data . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.2 DET curves comparing LCSS algorithm with different algorithmsusing different score normalizations . . . . . . . . . . . . . . . . . 68

5.3 Showing the effect of T-norm on the score distribution . . . . . . . 70

xi

ABBREVIATIONS

DTW Dynamic Time Warping

UE-DTW Unconstrained Endpoint - Dynamic Time Warping

LCS Longest Common Subsequence

RLCS Rough Longest Common Subsequence

RCS Rough Common Subsequence

WAR Width Across Reference

WAQ Width Across Query

RWAR Rough Width Across Reference

RWAQ Rough Width Across Query

HMM Hidden Markov Model

LSF Least Squares Fit

LCSS Longest Common Segment Set

Z-Norm Zero Normalization

T-Norm Test Normalization

EER Equal Error Rate

VAD Voice Activity Detection

xii

NOTATIONS

f Frequency value in hertzdri,q j Distance between reference’s i-th value and query’s j-th valueTd Threshold on distance, dri,q j

ci, j Cost of RLCS till reference’s i-th value and query’s j-th valuewr

i, j WAR till reference’s i-th value and query’s j-th valuewq

i, j WAQ till reference’s i-th value and query’s j-th valueβ A weight on densityρ Matching rateca

i, j Actual length of RLCS till reference’s i-th value and query’s j-th valuewr

i, j RWAR till reference’s i-th value and query’s j-th valuewq

i, j RWAQ till reference’s i-th value and query’s j-th valuest A semitone in centsδSXY Density of RCS SXY

lwSXY

Actual length of RCS SXY

gX Gaps in sequence XgY Gaps in sequence Yτsim Threshold on the similarity scoresµX

SXYSlope of the linear trend of stationary points in sequence X

σXSXY

Standard Deviation of the linear trend’s slope in sequence XλSXY Similarity in the linear trend of stationary points in sequences X and Yγ The number of gaps between two hard segmentsη Penalty issued for each gapµclaim

I Imposter mean for the claimσclaim

I Imposter standard deviation for the claim

xiii

CHAPTER 1

Introduction

1.1 Overview of the thesis

In Carnatic music, a raga is a collective expression of melodies which consists of:

1. A set of svaras (ornamented notes) ordered in a well defined manner.

2. Phrases (aesthetic threads of ornamented notes) as established by perfor-mances through ages as rendered in well known compositions.

While there are some ragas, in particular, for which the first condition suffices,

in general both these conditions are necessary and are used in practice. The

phrases that collectively give a raga its identity are called melodic motifs. The

melodic motifs are unique to a raga. Therefore, in any rendition of the raga, either

compositional or improvisational, these motifs are rendered in order to establish

the raga’s identity. Different renditions of a motif may differ slightly from each

other, but they are enough to confuse a time-series matching algorithm. The goal

of the thesis is to design algorithmic techniques to automatically find these motifs,

their different renditions and, then use the regions replete with these motifs to

perform raga verification.

The initial part of the thesis is dedicated towards finding different renditions of

melodic motifs in an improvisational form of raga called the alapana. This problem

is known as motif spotting. A melodic motif, preselected by a musician, is used

as a query and its different renditions are spotted using a matching algorithm.

Following this work, inspired by how trained listeners identify ragas, automatic

discovery of motifs is attempted using certain segments of compositions which are

supposed to be rich in motifs. Similar phrases are extracted from a number of such

segments of the compositions in a particular raga. All similar phrases need not be

melodic motifs. Some of them could also appear in other ragas thus, violating the

uniqueness property of the motifs. Therefore, these non-motif phrases are filtered

out if they are found in composition lines of other ragas. Using this approach,

various motifs are discovered for 14 ragas thus, confirming that these segments

are replete with motifs. Therefore, using these segments of compositions, raga

verification is performed. In raga verification, a melody (a single phrase or an

aesthetic concatenation of many such phrases) along with a raga claim is supplied

to the system. The system confirms or rejects the claim. Raga verification is

performed by comparing the snippet of audio supplied with various composition

lines of the claimed raga. The obtained score is matched against the scores obtained

with composition lines of confusing ragas using score-normalization techniques.

Two algorithms for time-series matching are proposed in this work. One is

a modification of the existing algorithm, Rough Longest Common Subsequence

(RLCS). Another proposed algorithm, Longest Common Segment Set (LCSS), is

completely novel and uses in between matched segments to give a holistic score.

This algorithm comes in two forms: hard and soft. Hard-LCSS treats individ-

ual matched segments separately irrespective of their lengths and distribution

whereas, soft-LCSS can join two or more segments based on their lengths and

distribution in order to compute a holistic score. Using the proposed algorithms,

an error rate of ∼ 12% is obtained for raga verification on a database consisting of

17 ragas.

2

1.2 Contribution of the thesis

The following are the main contributions of the thesis.

1. A measure based on the stationary points of the pitch contour is introducedthat reduces the number of false alarms.

2. Modifications to an existing time-series matching algorithm, known as RoughLongest Common Subsequence, are proposed that reduces the number of falsealarms and results in better localization.

3. A new time-series matching algorithm, known as Longest Common SegmentSet, is proposed which performs better for the task of raga verification.

4. Approaches are proposed to discover melodic motifs automatically from thecomposition lines and to find their different renditions.

5. A system is designed to perform raga verification which is scalable to anynumber of ragas.

1.3 Organization of the thesis

The organization of the thesis is as follows: In Chapter 2, a brief background on

carnatic music is given which is required for a better understanding of the work.

Some of the related works on spotting motifs, motif discovery and raga verfication

is also discussed this chapter.

Chapter 3 is dedicated towards describing the approach proposed in this the-

sis to find different renditions of motifs. This chapter describes the quantization

of a pitch contour into stationary point that preserved most of the raga informa-

tion. This chapter also describes the modifications made to an existing time-series

matching algorithm.

Chapter 4 describes the proposed approach for automatically discovering the

melodic motifs from the composition lines of the ragas. A measure is defined based

on the stationary points which reduces the false alarms.

3

Chapter 5 is dedicated towards explaining the raga verification system. Auto-

matic extraction of composition lines from a given composition is discussed. This

chapter also describes the concept of cohorts for a raga. A new time-series match-

ing algorithm, named as Longest Common Segment Set, is also proposed in this

chapter.

Finally, Chapter 6 summarizes the work and discusses the possible future work.

4

CHAPTER 2

Literature Survey

Carnatic music is an art music (often also referred to as classical music) tradition

commonly associated with four states of Southern India: Andra Pradesh, Karnataka,

Kerela and Tamil Nadu, and also some parts of Maharashtra. It is one of the two

main sub-genres of Indian classical music. The other sub-genre is Hindustani Music

which is mainly practiced in North India and also some parts of South India.

A Carnatic music concert is an ensemble of the main performer (usually a

vocalist), an accompanist (usually a violinist, occasionally a vainika or flautist) and

percussionists (a single mridangam vidwan (main percussionist), or an ensemble

of percussionists). If the main percussionist is right handed, s/he sits to the right

of the main artist and the violinist sits to left. The positions are exchanged when

the mridangam vidwan is left handed. All the performers sit on the stage cross

legged without any support.

The first musical sound of a concert is always of a tambura, a drone instrument

which provides the tonic for the entire concert. The tambura (tanpura) is a string

instrument that has four strings tuned to three pitches: P-(S)-(S)-S. ‘S’ is the first

pitch of an octave whereas ‘P’ is 1.5 times the pitch of ‘S’ which makes ‘P’ the

seventh pitch of the octave. The two ‘(S)’s, being twice the pitch of ‘S’, represent the

first pitch of the second octave. When these four strings are played continuously

in a conventional manner, the perceived sound, rich in harmonics, provides the

harmonic base for the performance.

Table 2.1: Svaras and their respective ratios to the base pitch ‘S’.

Svaras S R1 R2/G1 R3/G2 G3 M1 M2 P D1 D2/N1 D3/N2 N3 (S)Ratio 1 16/15 9/8 6/5 5/4 4/3 17/12 3/2 8/5 5/3 9/5 15/8 2

with ‘S’ 1.000 1.067 1.125 1.200 1.250 1.333 1.417 1.500 1.600 1.667 1.800 1.875 2.000

The sound of ‘S’, also referred as ‘sruti’, is a base pitch (tonic) with respect

to which all other pitches are defined. These musical pitches in the context of

Carnatic music are referred to as ‘svaras’. ‘S’ (Sadja) and ‘P’ (Panchama) are the

two of the seven svaras in an octave; other 5 being ‘R’ (Rishabha), ‘G’ (Gandhara),

‘M’ (Madhayama), ‘D’ (Dhaivata) and ‘N’ (Nisada). These five svaras have defined

variability. They take two to three pitch positions in contrast to ‘S’ and ‘P’ as

shown in Table 2.1. These manifestations of a svara, into multiple pitch positions,

are collectively defined as svarasthanas (svara positions) [27]. There are 12 pitch

positions within an octave and the total number of svarasthanas are 16. Therefore,

as shown in Table 2.1, there are overlaps between svarasthanas sharing the same

pitch position. For example, Chatusruti Rishabha (R2) and Suddha Gandhara (G1)

share the same pitch position. Therefore, this pitch position can be interpreted as

one of these two svarasthanas depending on the context.

A svara is not perceived as a single point of frequency although it is referred

as a definitive pitch. It is perceived as movements within a small range of pitch

values around a dominant mean. Figure 2.1 shows the histogram of pitch values

in a melody of raga Sankarabharanam and compares it with its Hindustani (raga

Bilawal) and Western classical (C-Major) counterparts that share the same scale.

The pitch histogram is continuous for Carnatic music and Hindustani music but

more or less discrete for Western music. It is clearly seen that the svaras are a range

of pitch values and this range is maximum for Carnatic music. This is because, the

intonation of a svara within the permissible range cognitively refers to only one

svarasthana. For example, when the svarasthana Antara Gandhra (G3) is constantly

6

0 200 400 600 800 1000 12000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Frequency (Cents)

No

rma

lize

d D

en

sity

Western Classical − C Major Scale

Hindustani Classical − Raga Bilawal

Carnatic Classical − Raga Shankarabharanam

S

R2

G3

M1

D2

N3

P

Figure 2.1: Comparing Pitch Histogram of Raga ‘Sankarabharanam’ with its Hin-dustani and Western classical counterparts.

moving within a range, it is cognitively recognized as G3 even if it touches upon

other svarasthanas. This concept where a svara is used to create a variability of

movement in relation to the phraseology and melodic identity, creating a cognitive

understanding of the svarasthana, is defined as a gamaka [27]. Therefore, a svara is

a complete embodiment of svarasthana and the associated gamakas.

There have been various documentations about the types and number of

gamakas. In [13] these gamakas are classified into 13 types. A comparison of a

phrase rendered in raga Sankarabharanam with gamakas and without gamakas is

shown in Figure 2.2. The phrases are represented as time-frequency trajectories of

pitch values. This trajectory is also referred as ‘pitch contour’. From the Figure

2.2 it is obvious that the deviations of the pitches from the norm is much higher

for gamaka laden svaras. The notes of Western classical music are transformed to a

7

0 1 2 3 4 5 6 7 8140

160

180

200

220

240

260

Time (seconds)

Fre

qu

en

cy (

He

rtz)

Tonic E

0 1 2 3 4 5 6140

160

180

200

220

240

260

Time (seconds)

Fre

qu

en

cy (

He

rtz)

Tonic E

a) With gamakas b) Without gamakas

Figure 2.2: Comparing a phrase in raga Sankarabaranam with gamakas and withoutgamakas

symbolic notation due to their shorter pitch range. Sometimes, even the improvi-

sations are also written in symbolic form but the gamaka laden svaras of Carnatic

music are difficult to express in a symbolic form.

It is precisely the presence of extensive gamakas that renders developing a sym-

bolic representation of Carnatic music extremely difficult. It also poses significant

challenges in the analysis of Carnatic music. These gamakas, however difficult to

represent, form the essential content of a melodic phrase. Since, a svara is mostly

rendered using gamakas, it was earlier thought that a melodic phrase could be

quantized in terms of 13 gamakas described in [13]. In practice, even though the

svaras are sung using these gamakas they are mostly present in a modified form

than a true form which is described in [13]. Figure 2.3 shows the pitch contour

of a melodic segment. It is clear that the gamakas present in their true form are

very rare thus making it difficult to quantize a melody in terms of these gamakas.

If a melody cannot be quantized in a sequence of gamakas, how can a melody be

represented? Before addressing this question it is important to understand the

concept of a ‘raga’ and its various forms of renditions.

8

Figure 2.3: The gamakas in their true form are marked in a pitch contour of a melody

The concept of a raga is very central to Carnatic music. A raga is a collective

expression of melodies that consists of gamaka laden svaras and phrases (smaller

melodic units) as rendered in well known compositions through the ages [26].

Scale or svara sequence of a raga is defined by its arohana and avarohana. Aro-

hana corresponds to the ascending order of svaras in the raga whereas avarohana

corresponds to the descending order of svaras in terms of pitch. Tonic is crucial in

the identification of a raga. A melody when heard without a referred tonic can be

perceived as two different ragas depending on the svara that is considered as tonic

[27].

Figure 2.4 shows the pitch contours of two similar phrases in raga ‘Sankarab-

haranam’. These phrases are rendered at different tonics. Any time series match-

ing algorithm will give large error during matching even though these are same

phrases in the same raga but in different tonics. Therefore, before performing

any kind of matching, the normalization of these phrases with respect to tonic is

important. Tonic normalization of these phrases is shown in Figure 2.4.

Raga identification can be done at different levels [27]. In some cases, a svara

itself, even when rendered without a gamaka, may be sufficient to identify a raga.

Identification of a raga can also be aided by different expression of a gamaka on a

svara. Phrases of a raga may also be used to identify it.

9

0 1 2 3 4 5 6 7 8 9140

160

180

200

220

240

260

280

300

Time (seconds)

Fre

quency (

Hert

z)

Tonic E

Tonic F#

—(1200. log2 f

tonic

)—I

0 1 2 3 4 5 6 7 8 9−400

−200

0

200

400

600

800

1000

Time (seconds)

Fre

quency (

Cents

)

Tonic E

Tonic F#

Figure 2.4: Tonic normalization of two similar phrases in raga ‘sankarabharanam’rendered at different tonics.

A phrase is an aesthetic thread of the articulated and the unarticulated svaras

in a raga. Phrases that collectively give a raga its identity are called melodic motifs.

Each time a musician renders a phrase, its form varies even though the core identity

is recognized. Figure 2.5 shows different renditions of a melodic motif in raga

Kalyani and raga Kamboji. The renditions differ slightly from each other but these

are enough to confuse a time-series matching algorithm. These improvisations

should not make the phrase sound like a different raga. Sometimes, even with a

little improvisation a phrase is perceived in a different raga. An example in [27]

states, the phrase P D1 N2 D1 P M1 with an elongated N2 is common to raga Thodi

and Bhairavi. A gamaka on M1 makes it sound like Bhairavi and without the gamaka

it sounds like raga Thodi. Although phrases can be sung at different speeds, for

some phrases an increase in speed constricts the rendition of gamakas in a svara

which can result in a different raga [27]. Other than variations within a phrase, the

way each phrase connects to another also changes but the raga remains the same.

A raga is rendered in various compositional and improvisational forms. Almost

all compositional forms start with a section called ‘pallavi’. The pallavi is usually

made up of one or two lines but is rich with melodic motifs of the raga [26]. The

anupallavi is the second section of the composition. In anupallavi, the melodic

movements in the higher octaves, with reference to the tonic, are present [26]. If

10

Figure 2.5: Different renditions of a melodic motif in raga ‘Kalyani’ and raga ‘Kam-boji’.

the anupallavi is present in a composition, it is always rendered after the pallavi

before any other section. Charanam is another section found in most compositions

which has a variable length depending on the type of composition [26].

There are many improvisational forms in Carnatic music like alapana, tanam, ni-

raval, kalpana svara, etc. We will discuss only alapana in detail as it is relevant to the

work. Alapana is generated by the musician’s distinctive imagination and creativ-

ity. Alapana is the opening of a raga and brings all the aspects of the raga without

using other elements like tala. Every alapana begins with a phrase (melodic motif)

that clearly establishes the identity of the raga. Once the identity is established, mu-

sicians tend to further explore the raga. This exploration leads to small variations

that start appearing in the renditions of svaras and phrases in the form of gamakas

or slight deviations from known phrases. These variations lead to newer phrases

in the raga that, over a period of time, can be used to identify the raga and can be

called as melodic motifs of that raga [26]. The possible ways to move from one

known phrase to another are numerous. The musician exploits this gap between

two known phrases and aesthetically connect them with a new phrase[26].

11

0 1 2 3 4−500

0

500

1000

1500

2000

Time in Minutes

Fre

qu

en

cy in

Ce

nts

Pitch Contour of an Alapana

0 0.5 1600

800

1000

1200

1400

Time in Seconds

Fre

qu

en

cy in

Ce

nts

Motif1

0 0.5 1600

800

1000

1200

1400

Time in Seconds

Fre

qu

en

cy in

Ce

nts

Motif2

0 0.5 1600

700

800

900

1000

1100

1200

1300

Time in Seconds

Fre

qu

en

cy in

Ce

nts

Motif3

Figure 2.6: Different instances of a melodic motif in an alapana marked in red.

Therefore, the raga while having an aesthetic core is also an evolving entity

through endless improvisation. In spite of this evolution, the identity of the raga

remains intact in most cases. A raga is much like an evolving personality while

the person remains the same. An example of an alapana showing the instances of

known phrases (melodic motifs) is shown in Figure 2.6. Earlier discussed melody

in Figure 2.3 is also of an alapana which made it clear that a melody in a raga

cannot be quantized in terms of 13 gamakas described in [13]. Now we will address

the question asked earlier, “If a melody cannot be quantized in a sequence of

gamakas, how can a melody be represented?” We know that every raga consists

of the well known phrases (melodic motifs) that are unique to that raga and can

be used to identify it. These phrases are also referred to as characteristic motifs,

distinctive motifs and typical motifs. In any rendition of a raga, it is required that

the characteristic motifs are rendered in order to establish the identity of the raga.

If the motifs in a recording can be located, then these motifs can be used to index

the recording. The focus of the initial part of the thesis is on locating motifs (as

defined by a musician) in a continuous alapana.

12

In [20], the uniqueness of these characteristic motifs was established using a

closed set motif recognition experiment using Hidden Markov Model (HMM).

Following this work, we attempt to spot motifs given a long alapana interspersed

with motifs. From Figure 2.5, it is clear that motifs that seemingly identical from

a perception perspective appear quite different (visually) when viewed as time

series. Time series motif recognition has been attempted for Hindustani music.

In [39], the onset point of the rhythmic cycle, emphasized by the beat of the tabla

(an Indian percussion instrument), is used as a cue for potential motif regions. In

another work [40], motif spotting is attempted in a Bandish (a type of composition

in Hindustani music) using elongated notes (nyaas svara).

Spotting motifs in a raga alapana is equivalent to finding a subsequence in a

time-frequency trajectory of the alapana. Interestingly, the duration of these motifs

may vary, but the relative duration of the svaras is preserved across the motif.

The attempt in this thesis is to use pitch contours as a time series and employ

time series pattern capturing techniques to identify the motif. The techniques

are customized to use the properties of Carnatic music. There has been work

done on time series motif recognition in fields other than music. In [36], a time

series motif is defined and motif discovery is attempted using the Enumeration of

Motifs through Matrix Approximation (EMMA) algorithm. In [4] and [28], time

series motifs are discovered by adapting the random projection algorithm to time

series data. In [3], a new warping distance called Spatial Assembling distance is

defined and used for pattern matching in streaming data. In [30], music matching

is attempted using a variant of the Longest Common Subsequence (LCS) algorithm

called Rough Longest Common Subsequence (RLCS).

Chapter 3 attempts similar time series motif matching for Carnatic Music.

13

Figure 2.7: Extraction of stationary points and their interpolation to get a smoothpitch contour.

Searching for a 2-3 second motif (in terms of a pitch contour) in a 10 min alapana

(also represented as a pitch contour) can be erroneous, owing to pitch estimation

errors. To address this issue, the pitch contour of the alapana is first quantized

to a sequence of stationary points (points in the pitch contour where the first-

derivative is 0), as shown in Figure 2.7, which are meaningful in the context of a

raga. The meaningfulness of these stationary points is validated by 13 listeners. In

order to validate, stationary points were interpolated using cubic B-splines. The

pitch trajectory corresponding to that of the interpolated curve was then used to

generate the melody. A similarity test was then performed to determine when

the original melodic segments and melodic segments generated after interpolation

were indeed similar. A very high similarity score of 7, out of 10, was obtained.

The examples presented to the listeners for validation are available online1.

To determine the location of the motif, a two-pass search is performed. In

the first pass, Rough Longest Common Subsequence approach with modifications

is used to find the region corresponding to the location of the motif using the

1http://www.iitm.ac.in/donlab/motif_analysis.html

14

http://www.iitm.ac.in/donlab/motif_analysis.html

stationary points. Once the region is located, another pass is made on this region

using the raw pitch contour instead of stationary points. Although the results

using this approach were very promising, it required that musicians first identify

typical motifs manually. It was also observed that the number of false alarms were

significantly high. Also, the correlation amongst musicians with respect to correct

phrases was as high as ∼ 0.8 while, for false alarms the correlation was as low as

∼ 0.4. High ranking false alarms were primarily due to partial matches with the

given query. Many of these were considered as an instance of the queried motif

by some musicians. Initially, motifs with shorter duration were used and for these

shorter motifs the inconsistency was high. Due to these problems, the scalability

of this approach to extend to more ragas was less. This also illustrates that the

typical motif itself is questionable. Nevertheless, there is a core using which the

audience very quickly identifies ragas easily. The rest of the thesis focuses on this

ability of listeners.

As alapana is an improvisational segment, the rendition of the same motif could

be different across alapanas especially among different schools. On the other hand,

compositions in Carnatic music are rendered more or less in a similar manner. Al-

though the music evolved through the oral tradition and fairly significant changes

have crept into the music, renditions of compositions do not vary very significantly

across different performers and schools. The number of variants for each line of the

song can vary quite a lot though. Nevertheless, the typical motifs and the metre

of motifs will be generally preserved. An attempt is therefore made, to determine

the typical motifs automatically.

It is discussed in [32] that not all repeating patterns are interesting and relevant.

In fact, the vast majority of exact repetitions within a music piece are not musically

15

interesting. The algorithm proposed in [32] mostly generates interesting repeating

patterns along with some non-interesting ones which are later filtered during post

processing. This work is an attempt from a similar perspective. The only difference

is that typical motifs of ragas need not be interesting to a listener. The primary

objective for discovering typical motifs, is that these motifs can be used to index

the audio of a rendition. For example, as discussed earlier, while performing an

alapana of a raga, musicians bridge two well known motifs of that raga with new

phrases using their creativity. These new phrases are musically more interesting

as they are the result of an ever evolving raga. The known typical motifs can be

used to index the alapana and the new phrases connecting them could be extracted.

Typical motifs could also be used for raga classification.

In Carnatic music, the composition still holds a very important position. Many

artists change the phrases in the alapana based on the composition that is likely

to follow. The proposed approach in this work generates similar patterns across

composition lines of a raga. From these similar patterns, the typical motifs are

filtered by using composition lines of other ragas. Motifs are considered typical of

a raga if they are present in the composition lines of a given raga and absent from

composition lines of other ragas. This filtering approach is similar to anti-corpus

approach of Conklin [8, 9] for the discovery of distinctive patterns.

Most of the earlier work, regarding discovery of repeated patterns of interest in

music, is on western music. In [22], B. Jansen et al discuss the current approaches

on repeated pattern discovery. It discusses string based methods and geometric

methods for pattern discovery. In [31], Lie Lu et al used constant Q transforms and

proposed a similarity measure between musical features for doing repeated pattern

discovery. In [32], Meredith et. al. presented Structure Induction Algorithms (SIA)

16

using a geomatric approach for discovering repeated patterns that are musically

interesting to the listener. In [6, 7], Collins et. al. introduced improvements in

Meredith’s Structure Induction Algorithms. There has also been some significant

work on detecting melodic motifs in Hindustani music by Joe Cheri Ross et. al.

[39]. In this approach, the melody is converted to a sequence of symbols and a

variant of dynamic programming is used to discover the motif.

As mentioned before the typical motifs can be used for raga classification but

when the number of ragas increases, the scalability of this approach becomes an

issue. In Chapter 5, inspired by how the listener tries to identify raga during a

concert, an attempt is made to mimic the same. During a concert, the performer

usually begins by establishing the identity of the raga. When the musician is

establishing the identity by rendering the raga, the listener narrows down the

search space from hundreds of ragas to a small likely subset of ragas. By further

listening to the musician, the listener identifies the peculiarities, match them with

the shortlisted ragas and finally, identifies the raga. First, to mimic the reduction of

search space, a raga recording is presented with a claim. The claim is a raga which

a listener has associated it with. For every raga, a set of cohorts are identified

by a musician. Cohorts are ragas that have similar phrases and can be confused

with the given raga. The cohort raga list is used to reduce the search space. The

task that remains is to determine whether claimed raga is correct. This is done by

using a novel matching algorithm known as Longest Common Segment Set (LCSS)

along-with score normalization.

There is no parallel in Western classical music to raga verification. The closest

that one can associate with, is cover song detection [14, 33, 43], where the objective is

to determine the same song rendered by different musicians. Whereas, as discussed

17

in Chapter 2, two different renditions of the same motif may not be identical.

Several attempts have been made earlier to identify ragas [5, 11, 12, 18, 20, 25, 29, 47].

Most of these efforts have used small repertoires or have focused on ragas for which

ordering is not important. In [47], the audio is transcribed to a sequence of notes

and string matching techniques are used to perform raga identification. In [5],

pitch-class and pitch-dyads distributions are used for identifying ragas. Bigrams

on pitch are obtained using a twelve semitone scale. In [35], the authors assume that

an automatic note transcription system for the audio is available. The transcribed

notes are then subjected to HMM based raga analysis. In [25, 46], a template based

on the arohana and avarohana is used to determine the identity of the raga. The

frequency of the svaras in Carnatic music is seldom fixed. Further, as indicated

in [48] and [49], the improvisations in extempore enunciation of ragas can vary

across musicians and schools. This behaviour is accounted for in [23, 24, 29] by

decreasing the binwidth for computing melodic histograms. In [29], steady note

transcription along with n-gram models is used to perform raga identification. In

[11] chroma features are used in an HMM framework to perform scale independent

raga identification, while in [12] hierarchical random forest classifier is used to

match svara histograms. The svaras are obtained using the Western transcription

system. These experiments are performed on 4 to 8 different ragas of Hindustani

music. In [18], an attempt is made to perform raga identification using semi-

continuous Gaussian mixtures models. This will work only for ragas with linear

ordering of svaras.

Recent research indicates that a raga is characterised best by a time-frequency

trajectory rather than a sequence of quantised pitches [20, 38, 39, 45]. In [38, 39],

the sama of the tala (emphasised by the bol of tabla) is used to segment a piece. The

repeating pattern in a bandish in Hindustani Khayal music is located using the

18

sama information. In [20, 38], motif identification is performed for Carnatic music.

Motifs for a set of five ragas are defined and marked carefully by a musician.

Motif identification is performed using hidden Markov model (HMM) trained

for each motif. Similar to [39], motif spotting in an alapana in Carnatic music is

performed in Chapter 3. In [45], a number of different similarity measures for

matching melodic motifs of Indian music was attempted. It was shown that the

intra pattern type variance of the melodic motifs is higher for Carnatic music in

comparison with that of Hindustani music. It was also shown that the similarity

obtained is very sensitive to the measure used. All these efforts are ultimately

aimed at obtaining typical signatures of ragas. It is shown in Chapter 3 that there

can be many signatures for a given raga. To alleviate this problem in Chapter 4,

an attempt was made to obtain as many signatures for a raga by comparing lines

of compositions. Here again, it was observed that the typical motif detection was

very sensitive to the distance measure chosen. Using typical motifs/signatures for

raga identification is not scalable, when the number of ragas under consideration

increases. In raga-verification, as the task of identifying ragas narrows to a few

number of ragas, it is scalable to any number of new ragas.

19

CHAPTER 3

Motif Spotting

3.1 Introduction

A raga in Carnatic music can be characterised by a set of distinctive motifs. Dis-

tinctive motifs can be characterised by the trajectory of inflected svaras over time.

These motifs are of utmost aesthetic importance to the raga. Carnatic music is

a genre abundant with compositions. These compositions are replete with many

distinctive motifs. These motifs are used as building blocks for extempore improvi-

sational pieces in Carnatic music. These motifs can also be used for distinguishing

between two ragas, and also for archival and learning purposes. The objective of

the work presented in this chapter is to spot the location of the distinctive motifs in

an extempore enunciation of a raga called the alapana. In Carnatic music, the motifs

are laden with gamakas [27]. In addition, the motifs are similar across musicians

but not necessarily identical. The duration of the motifs can also vary quite signif-

icantly although the rhythm may be preserved. The query motif in general is very

short in duration compared to that of the test music segment. Several factors need

to be considered when dealing with this problem namely: selection of features,

time complexity, tolerance to noise, tolerance to speed variation, allowing partial

matches or rough matches rather than exact matches, timbre, etc [17, 30, 50].

In this chapter, pitch is used as the main feature for the task of motif spot-

ting. Substantial research exists on analysing different aspects of Carnatic music

computationally, using pitch as a feature. In [28], gamakas are characterized and

analysed using pitch contours. In [21], tuning of Indian classical music is studied

using pitch histograms. In [48], the motifs are extensively studied in the raga Thodi

using pitch histograms and pitch contours. All of the above prove the relevance

and importance of pitch as a feature for computational analysis of Carnatic music.

There are a number of dynamic programming techniques namely,the Dynamic

Time Warping (DTW), the Longest Common Subsequence (LCS) and their variants,

which are used for similar music matching tasks. DTW takes care of the speed

variations due to warping but forces the match from end-to-end of both the query

and the test sequences. Even unconstrained endpoint DTW will align an entire

query with a part of the test sequence [16]. In motif-spotting, there can be instances

where one can expect that most of the query is roughly matched with a part of

the test sequence. Although LCS does not force the match between query and test

to be end-to-end, it does not give importance to local similarity. Rough Longest

Common Subsequence (RLCS) addresses the issue of local similarity where some

leeway is given for partial query matches [30]. Other than partial query matches,

when the characteristic motif, for example, “Sa Ni Da Pa Da” is rendered as “Sa

Ri Ni Da Pa Da”, RLCS gives a good match since it gives the longest matched

subsequence.

3.2 Stationary Points

The task therefore is to attempt automatic spotting of a motif that is queried. The

motif is queried against a set of alapanas of a particular raga to obtain locations of

the occurrences of the motif. The task is non-trivial since no particular rhythm

is maintained in an alapana nor it is accompanied by a percussion instrument.

21

Figure 3.1: A Phrase with Stationary Points

Figure 2.6 shows repetitive occurrences of motifs in a piece of music. An enlarged

view of the motif is also shown. Since the alapana is much longer than the motif,

searching for a motif in an alapana is like searching for a needle in a haystack. After

an analysis of the pitch contours and discussions with professional musicians, it

was conjectured that the pitch contour can be quantized at stationary points. The

conjecture was confirmed as explained in Chapter 2. Figure 3.1 shows an example

phrase of the raga Kamboji with the stationary points highlighted.

Musically, the stationary points are a measure of the extent to which a particular

svara is intoned. In Carnatic music since svaras are rendered with gamakas, there is

a difference between the notation and the actual rendition of the phrase. However,

there is a one to one correspondence with the stationary point frequencies and

what is actually rendered by the musician (Figure 3.1). Figure 3.2 shows the pitch

histogram and the stationary point histogram of an alapana of the raga Kamboji.

The similarity between the two pitch histograms vindicates our conjecture that

stationary points are important.

22

Figure 3.2: The Pitch and Stationary Point Histograms of the raga Kamboji

3.2.1 Method of obtaining Stationary Points

Carnatic music is a heterophonic musical form. In a Carnatic music vocal concert,

a minimum of two accompanying instruments play simultaneously along with

the lead artist. These are the violin and the mridangam (a percussion instrument

in Carnatic music). Carnatic music is performed at a fixed tonic[2] to which all

instruments are tuned. The tonic is chosen by the lead artist and is maintained

throughout the performance by an instrument called the Tambura as discussed in

Chapter 2. The simultaneous performance of many instruments in addition to the

voice renders pitch extraction of the predominant voice a tough task. This leads

to octave errors and other erroneous pitch values. For this task it is necessary that

pitch be continuous. After experimenting with various pitch algorithms, it was

observed that the Melodia-Pitch Extraction algorithm [41] produced the fewest

errors. This was verified after re-synthesis using the pitch contours. In case of

an octave error or any other such pitch related anomaly, the algorithm replaces

the erroneous pitch values with zeros. The stationary points are obtained by

processing the pitch contour extracted from the waveform. The pitch extracted

23

Figure 3.3: Original and Cubic Interpolated pitch contours

is converted to the cent scale using (3.1) to normalise with respect to the tonic of

different musicians.

centFrequency = 1200 · log2

(f

tonic

)(3.1)

Least Squares Fit (LSF)[37] was used to compute the slope of the pitch extracted.

The zero crossings of the slope correspond to the stationary points (Figure 3.1). A

Cubic Hermite interpolation[15] was then performed with the initial estimation of

stationary points to get a continuous curve (Figure 3.3). The stationary points are

then again estimated from this continuous curve.

24

3.3 Rough Longest Common Subsequence Algorithm

Rough Longest Common Subsequence (RLCS), a variant of Longest Common

Subsequence (LCS), performs an approximate match between a reference sequence

and a query sequence while retaining the local similarity [30]. It introduces three

major changes in LCS namely, rough match, width-across-reference (WAR) and

width-across-query (WAQ) for local similarity and score matrix.

3.3.1 Rough match

In the recurrence function of LCS, the cost function is incremented by 1 when there

is an exact match. In RLCS, when the distance between a reference point, ri, and

a query point, qi, is less than a threshold, Td, they are said to be roughly matched,

ri ≈ qi i.e. d(ri, q j) < Td → ri ≈ qi, where d(ri, q j) is the distance between r j and q j.

The cost is incremented by a number, δ, between 0 and 1 instead of 1, based on

how good the match is as shown in (3.2).

δi, j = 1 −dri,q j

Td(3.2)

The cost is estimated using the following recurrence:

ci, j =

0 ; i. j = 0

ci−1, j−1 + δi, j ; ri ≈ q j

max(ci−1, j, c j−1,i) ; ri! ≈ qi

(3.3)

In LCS the cost gives the length of the Longest Common Subsequence. The cost

of RLCS is not incremented by 1 but it represents the length of the Rough Longest

25

Common Subsequence. Later, it is argued that this length is actually a rough length

of RLCS rather than its actual length.

3.3.2 WAR and WAQ for local similarity

To retain the local similarity, width-across-reference, WAR, and width-across-

query, WAQ, are used. WAR and WAQ represent the length of the shortest

substring of the reference and the query respectively, containing the LCS. These

measures represent the density of LCS in the reference and the query. Small values

of WAR and WAQ indicate a dense distribution of LCS. WAR is incremented by 1

if there is a rough match or jump along the reference. Likewise for the WAQ. WAR

and WAQ are computed using the following recurrences:

wri, j =

0 ; i. j = 0

wri−1, j−1 + 1 ; ri ≈ q j

wri−1, j + 1 ; ri! ≈ qi, ci−1, j ≥ ci, j−1

wri, j−1 ; ri! ≈ qi, ci−1, j < ci, j−1

(3.4)

wqi, j =

0 ; i. j = 0

wqi−1, j−1 + 1 ; ri ≈ q j

wqi−1, j ; ri! ≈ qi, ci−1, j ≥ ci, j−1

wqi, j−1 + 1 ; ri! ≈ qi, ci−1, j < ci, j−1

(3.5)

In (3.4) and (3.5), some of the cases and conditions are dropped from [30] for

the sake of clarity.

26

3.3.3 Score matrix

WAR, WAQ and cost are used to compute the score of a common subsequence in

the following way:

Scorei, j =

(β

ci, j

wri, j

+ (1 − β)ci, j

wqi, j

)·

ci, j

n i f ci, j ≥ ρn

0 otherwise

(3.6)

In (3.6), large value ofci, j

wri, j

suggests that the density of the RLCS is high in the

reference. Similarly, a large value ofci, j

wqi, j

is indicative of higher density of RLCS in

the query. β weighs between these two ratios. Large value ofci, j

n indicates that a

large part of the query has been matched, where n is the length of the query. ρ is

the matching rate that represents how long the length of the RLCS should be, with

respect to the query.

The algorithm to compute these values using Dynamic Programming is pre-

sented in [30].

3.4 Modified-Rough Longest Common Subsequence

In this section, the modifications made to the existing RLCS algorithm and the

rationale behind them are discussed.

27

3.4.1 Rough and actual length of RLCS

In [30], ci, j is defined as the length of the RLCS. But, it actually represents a rough

length of RLCS because it is incremented by δi, j when there is a rough match. The

resulting value of ci, j need not be an integer. Therefore, it cannot be the actual

length of any sequence. The actual length of RLCS is defined by the following

recurrence:

cai, j =

0 ; i. j = 0

cai−1, j−1 + 1 ; ri ≈ q j

max(cai−1, j, c

aj−1,i) ; ri! ≈ qi

(3.7)

In (3.7), cost is incremented by 1 on a rough match. In (3.6), while computing score,

half of the importance is given to the ratio of rough length of RLCS and the query

length. Instead of just considering how good the rough length of the RLCS with

respect to the query length is, it is conjectured that it is also important to consider

how good the rough length of the RLCS is with respect to the actual length of the

RLCS.ci, j + ci, j

cai, j + n

gives equal importance to both the ratios. This term is similar to the F1 score where

precision and recall are given equal importance.

3.4.2 RWAR and RWAQ

WAR and WAQ represent the width of the shortest substring that contains the

RLCS. As discussed in the previous subsection, ci, j represents the rough length

28

which is shorter than the actual length of the RLCS. Therefore, it is not clear

whetherci, j

wri, j

really represents the density of the RLCS in the reference. This term

also penalizes based on the degree of match, while a penalty has already been

accounted for in the term,ci, j

n . Therefore, a rough width across reference and query

is required that represents the rough width of the shortest substring containing

the RLCS. On a rough match, cost is incremented by a δi, j. At the same time,

when a rough match is obtained, the WAR and WAQ are also incremented by δi, j

resulting in Rough WAR (RWAR) and Rough WAQ (RWAQ), respectively. When

there is no match, RWAR and RWAQ are incremented by 1 whereas the cost is not

incremented. Therefore, RWAR and RWAQ account for the density of the RLCS

in the reference and query better. RWAR and RWAQ can be computed by the

following recurrences:

rwri, j =

0 ; i. j = 0

rwri−1, j−1 + δi, j ; ri ≈ q j

rwri−1, j + 1 ; ri! ≈ qi, ci−1, j ≥ ci, j−1

rwri, j−1 ; ri! ≈ qi, ci−1, j < ci, j−1

(3.8)

rwqi, j =

0 ; i. j = 0

rwqi−1, j−1 + δi, j ; ri ≈ q j

rwqi−1, j ; ri! ≈ qi, ci−1, j ≥ ci, j−1

rwqi, j−1 + 1 ; ri! ≈ qi, ci−1, j < ci, j−1

(3.9)

29

3.4.3 Matched rate on the query sequence

In (3.6), ρ is an empirical parameter that is set based on the required match rate on

the entire query sequence. The score is updated by a non-zero value, only if the

rough length of the RLCS is greater than ρ × n. It is not clear how to set the value

of ρ or what it means for the rough length to be greater than a fraction of the query

length. Instead, it would be better to update the score by a non-zero value if the

actual length is greater than ρ × n. This makes the interpretation clear and makes

it easy to set the value of ρ.

The score update of the modified-RLCS is given by the following equation:

Scorei, j =

(β

ci, j

rwri, j

+ (1 − β)ci, j

rwqi, j

)·

ci, j+ci, j

cai, j+n i f ca

i, j ≥ ρn

0 otherwise

(3.10)

3.5 A Two-Pass Dynamic Programming Search

In Section 3.2 it is illustrated that the sequence of stationary points are crucial for

a motif. Therefore, RLCS is used to query for the stationary points of the given

motif in the alapana.

Music matching using LCS methods for western music is performed on sym-

bolic music data[19]. The musical notes in this context are the symbols. However,

in the context of Carnatic music, there is no consistent one to one correspondence

between the notation and the sung melody. Although, in this work, stationary

points are used instead of a symbolic notation, one must keep in mind that sta-

tionary points are not symbols but are continuous pitch values. In order to match

such pitch values, a rough match instead of an exact match is required. A variant

30

of the LCS known as the Rough Longest Common Subsequence [30] allows such a

rough match.

In this work, a two pass RLCS matching is performed. In the first pass, the

stationary points of the reference sequence and the query sequence are matched to

obtain the candidate motif regions. Nevertheless, given two consecutive stationary

points, the pitch contour between these two stationary points can be significantly

different for different phrases. This leads to many false alarms. A second pass of

RLCS is then performed on the regions obtained from the first pass to filter out the

false alarms from the true motifs.

3.5.1 First Pass: Determining Candidate Motif Regions using

RLCS

The RLCS algorithm used in this work is illustrated in this section. The alapana is

first windowed and then processed with the RLCS algorithm. The window size

chosen for this task is 1.5 times the length of the motif queried for. The matrices

obtained from the RLCS are then processed as follows:

• From the cells of the score matrix with values greater than a threshold,seqFilterTd, sequences are obtained by tracing the direction matrix backwards.

• The duplicate sequences which may be acquired are neglected, preservingunique sequences of length greater than ρ times the length of reference. Theseare then added to a sequence buffer.

• This process is repeated for every window. The window is shifted by a hopof one stationary point.

• The sequences obtained thus are grouped.

• Each group, taken from the first element of the first member to the lastelement of the last member, represents a potential motif region.

31

3.5.2 Second Pass: Determining Motifs from the Groups

In the first pass a matching of only the stationary points is performed. As men-

tioned above, even though the stationary points are matched it is not necessary

that the trajectory between them matches. This leads to a large number of false

alarms. Now that the search space is reduced, the RLCS is performed between the

entire pitch contour of the potential motif region obtained in the first pass and the

motif queried. The entire pitch contour is used in order to account for the trajectory

information contained in the phrases. The threshold Td used for the first pass is

tightened in this iteration for better precision while matching the entire feature

vector. In this iteration, the cell of the score matrix having the maximum value is

chosen and the sequence is traced back using the direction matrix from this cell.

This sequence is hypothesized to be the motif. The database and experimentation

are detailed in the following sections.

3.6 Dataset

Table 3.1 gives the details of the dataset of alapana used in this work. As mentioned

above, this task will be performed on alapanas. The motifs are categorized into two

types based on their durations: short motifs and long motifs. The details of these

motifs are given in Table 3.2 and Table 3.3. The average duration is obtained from

the labeled ground truth. The long motifs are inspired by the “raga test” conducted

by Rama Verma1. Most people across the globe were able to unambiguously

determine the identity of ragas using these motifs. An attempt was made to use

the motifs from Rama Verma’s raga test directly. As the recordings are rather noisy,

1http://www.youtube.com/watch?v=3nRtz9EBfeY

32

http://www.youtube.com/watch?v=3nRtz9EBfeY

the same motifs were generated by a professional musician. In particular, we have

chosen only the raga “Bhairavi” for illustration.

Table 3.1: Dataset of alapanas

Raga Number of Number of Average TotalName Alapanas Artists Duration (mins) Duration (mins)Kamboji 27 12 9.73 262.91Bhairavi 16 13 10.65 170.48

Table 3.2: Short Motifs

Raga Labeled AverageName Ground Truth Duration (secs)Kamboji 70 1.8837Bhairavi 103 1.3213

Table 3.3: Long Motifs

Raga Labeled AverageName Ground Truth Duration (secs)Bhairavi 59 3.18

3.7 Experiments and Results

3.7.1 Querying motifs in the alapanas

RLCS was performed on the dataset of alapanas. The distance function used for

RLCS is cubic in nature with the equation given below.

di, j =

|xi−y j|

3

(3st)3 i f | xi − y j |< 3st

0 otherwise(3.11)

33

where xi and y j represent pitch values and st represents a semitone in cents. Due

to different styles of various musicians, an exact match between two pitch values

contributing to the same svara cannot be expected. Hence, in this work a leeway

of 3 semitones is allowed between pitch values. Musically two pitch values, 3

semitones apart, cannot be called similar but this issue is addressed by the cubic

nature of the similarity function. The function reaches its half value when the

difference in two symbols is approximately half a semitone. Therefore, lower

distance values are obtained when the corresponding pitch values are at most half

a semitone apart. In this work, the phrases sung across octaves are ignored. For

this experiment the parameters set were as follows: Td = 0.45; β = 0.5 ; ρ = 0.8.

The parameter 0 < ρ < 1 is a user defined parameter that ensures that ρ× length of

the query motif is matched with that of the alapana.

The details of the number of ground truth motifs retrieved and the total number

of trues retrieved are given in Table 3.4 and Table 3.5 for short motifs and long motifs

respectively. The number of false positives, retrieved are however substantial. This

is affordable since the objective in the first pass is to obtain the maximum number

of the regions similar to the motif. The second iteration of RLCS is performed to

filter out the false positives.

Table 3.4: Short Motifs: Retrieved regions after the first pass

Raga Total True Precision RecallName Retrieved Retrieved (%) (%)Kamboji 719 58 8.07 82.86Bhairavi 474 91 19.20 88.35

Now that the candidate motif regions are known, the second pass of RLCS is

conducted wherein the same motifs are queried in the regions retrieved by the

first pass. The entire pitch contour of the query and reference are used for this

34

Table 3.5: Long Motifs: Retrieved regions after the first pass

Raga Total True Precision RecallName Retrieved Retrieved (%) (%)Bhairavi 194 51 26.29 86.44

task in order to account for the information of trajectory of pitches between the

stationary points. The requirement of a query motif for such a search is due to

the scarce rendition of certain characteristic phrases in an alapana. The spotting of

such phrases proves to be useful to musicians and students for analysis purposes.

The hits obtained in the second pass are sorted according to the RLCS scores.

Top 10 hits per alapana are considered to compute the precision and recall. The

motifs are not exact since they correspond to the extempore enunciation by an

artist. The relevant motifs are all motif that were marked as true in that alapana.

The results are illustrated in Table 3.6 and Table 3.7.

Table 3.6: Short Motifs: Top 10 retrieved motifs after the second pass

Raga Precision RecallName (%) (%)Kamboji 40.45 76.00Bhairavi 41.25 91.04

Table 3.7: Long Motifs: Top 10 retrieved motifs after the second pass

Raga Precision RecallName (%) (%)Bhairavi 31.65 74.58

35

3.7.2 Comparison between RLCS and Modified-RLCS using longer

motifs

Motif spotting is performed using RLCS and modified-RLCS on the dataset of

alapanas using longer motifs as queries. Td is set to 0.45 in both the methods.

This is done so that pitch values which are approximately one semitone apart are

considered as rough match. ρ is set to zero because the best value of ρ could be

different for both the methods which makes the comparison difficult.

First Voice Activity Detection (VAD) is performed on the alapanas to get the

voiced parts. This approximately segments the alapana into phrases. Instead of

the entire alapana, these voiced regions are used. In the first pass, stationary points

of the query motif and test alapana are used and the motif regions or groups are

retrieved along with their scores. Each group either corresponds to a motif or a

false alarm. A true group consists of one or more true positives. Score distribution

of the true positive groups and false alarm groups for both the algorithms are

shown in Figure 3.4. Each score value is subtracted from the mean of scores of the

false alarms such that mean of the false alarms’ distribution becomes zero for both

the algorithms. This enables a better comparison between RLCS and modified-

RLCS algorithms. The shared region in the score distribution of true positives

groups and false alarms groups is less in the modified-RLCS algorithm than in the

RLCS algorithm.

Motifs are sparsely present in an alapana. Our purpose is to retrieve as many

motifs as possible. Spotting all or most of the motifs is more crucial than removal

of all false alarms. Therefore higher penalty is given for missing a motif than

for a false alarm group. Score threshold is selected from the minimum detection

cost function for both the algorithms. The sequences whose scores are above the

36

−1 −0.5 0 0.5 1

Density

Scores−1 −0.5 0 0.5 1

Density

Scores

a) RLCS b) Modified-RLCS

Figure 3.4: a) True positive groups’ and false alarm groups’ score distribution forRLCS. b) True positive groups’ and false alarm groups’ score distribu-tion for modified-RLCS.

score threshold are preserved. The details of the comparison after the first pass are

shown in Table 3.8. Modified-RLCS has shown a clear improvement over RLCS in

terms of false alarms and average duration of true positives and false alarms.

Table 3.8: Long Motifs: Retrieved regions after the first pass

Algorithm Total True True Positive False Alarm Precision RecallRetrieved Retrieved Duration (avg.) Duration (avg.) (%) (%)

RLCS 194 51 9.61 secs 12.73 secs 26.29 86.44Modified-RLCS 151 52 9.00 secs 11.95 secs 34.44 88.14

These regions are used as the tests in the second pass. In the second pass, tonic

normalized smoothed pitch contour is used as a feature. The primary objective

of the second pass is to locate the motifs within a group and remove as many

false alarm groups as possible. The details of the comparision after the second

pass are given in Table 3.9. There is a reduction in the number of false alarms in

both the algorithms but the precision and localization of motifs are still better for

modified-RLCS.

37

Table 3.9: Long Motifs: Retrieved regions after the second pass

Algorithm Total True True Positive False Alarm Precision RecallRetrieved Retrieved Duration (avg.) Duration (avg.) (%) (%)

RLCS 180 50 8.25 secs 11.12 secs 27.78 84.75Modified-RLCS 144 50 7.68 secs 11.00 secs 34.72 84.75

3.8 Discussion

From the results obtained in Table 3.6 and Table 3.7, it is clear that even though

the precision is low, the recall is high in most of the cases. Certain partial matches

are also obtained where either the first part of the query is matched or the end of

the query is matched. These are movements similar to those of the phrases and

are interesting for a listener, learner, or researcher. High scores were obtained for

certain false alarms. This is primarily due to melodic similarities between the false

alarm and the original phrase.

Modified-RLCS results in less false alarms compared to RLCS in both the passes.

The duration of the hits is also less when compared to that of RLCS which means

that the localization is better in modified-RLCS. This is primarily due to the fact

that the actual length of the match is used in the modified-RLCS. In (3.6), the term

ci, j

n

focuses on getting an RLCS whose rough length is as large as length of query motif

but in (3.10), the termci, j + ci, j

cai, j + n

gives equal importance to both: getting an RLCS whose rough length is as large

as the length of the query motif and also on getting an RLCS whose rough length

38

is as large as the actual length of the RLCS. Due to this, shorter sequences also get

a good score if they represent the motif adequately.

3.8.1 Importance of VAD in motif spotting

Voice Activity Detection (VAD) on the alapanas is a very crucial step. This step

removes the noise and reduces the search space. Table 3.10 shows the results after

the two passes using the modified-RLCS algorithm. The method for selecting the

score thresholds after both the passes remains the same. The groups have become

much longer though the number of true positive groups and false alarm groups

has reduced. Some of the true positive groups have more than one instance of the

motif. Therefore, the number of true positives are much more than the number

of true positive groups. But they are not at all localized properly even after the

second pass. The number of false alarm groups is also very less but the duration is

very high, approximately 1 minute. The total duration of the false alarms without

VAD is much more than that of those with VAD after each of the passes. This

vindicates the use of VAD.

Table 3.10: Retrieved Groups after both the passes for modified-RLCS withoutVAD

Pass No. True Group True Positive True Group False Group False Group

Retrieved Retrieved Duration (avg.) Retrieved Duration (avg.)

Pass 1 29 57 80.31 secs 20 68.23 secs

Pass 2 38 57 44.33 secs 31 45.45 secs

39

3.9 Summary

In this work, RLCS is used for motif spotting in alapanas in Carnatic music. It is

illustrated that the stationary points of the pitch contour of a musical piece hold

significant music information. It is then shown that quantizing the pitch contour of

the alapana at the stationary points leads to no loss of information while it results

in a significant reduction in the search space. The RLCS method is shown to

give a high recall for the motif queried. Given that the objective is to explore the

musical traits of a raga by spotting interesting melodic motifs rendered by various

artists, the recall of the motif queried is of higher importance than the precision. A

modified version of RLCS algorithm is also presented which gives better scores for

sub-sequences which are shorter than the query but have matched reasonably well

with most of the query. Modified-RLCS was tested on longer motifs and compared

favorably with the original RLCS. The importance of performing Voice Activity

Detection is also discussed.

40

CHAPTER 4

Motif Discovery

4.1 Introduction

A raga in Carnatic music is characterised by typical phrases or motifs. They are

primarily pitch trajectories in the time-frequency plane. Although, for annotation

purposes, ragas in Carnatic music are based on 12 srutis (or semitones), the gamakas

associated with the same semitone can vary significantly across ragas as discussed

in Chapter 2. Nevertheless, although the phrases do not occupy fixed positions in

the time-frequency (t-f) plane, an experienced listener can determine the identity

of a raga within few seconds of an alapana. The objective of the presented work here

is to determine typical motifs of a raga automatically. This is obtained by analyzing

various compositions that are composed in a particular raga. Unlike Hindustani

music, there is a huge repository of compositions by a number of composers in

different ragas. It is often stated by musicians that the famous composers have

composed such that a single line of a composition is replete with the motifs of the

raga. In this work, we therefore take single lines of different compositions and

determine the typical motifs of the raga.

In a Carnatic music concert, many listeners from the audience are able to

identify the raga at the very beginning of the composition, usually during the

singing of the first line itself — a line corresponds to one or more tala cycles.

Thus, first lines of the compositions could contain typical motifs of a raga. A

pattern which is repeated within a first line could still be not specific to a raga.

Figure 4.1: RLCS matching two sequences partially

Whereas, a pattern that is present across different composition lines could be a

typical motif of that raga. Instead of just using first lines, we have also used other

lines of compositions, namely, the lines from pallavi, anupallavi and charanam. In

this chapter, an attempt is made to find repeating patterns across these lines and

not within a line. Typical motifs are filtered from the generated repeating patterns

during post processing. These typical motifs are available online 1.

The length of the typical motif to be discovered is not known a priori. Therefore

there is a need for a technique which can itself determine the length of the motif

at the time of discovering it. Dynamic Time Warping (DTW) based algorithms can

only find a pattern of a specific length since it performs end-to-end matching of the

query and test sequence. There is another version of DTW known as Unconstrained

End Point-DTW (UE-DTW) that can match the whole query with a partial test

1http://www.iitm.ac.in/donlab/typicalmotifs.html

42

http://www.iitm.ac.in/donlab/typicalmotifs.html

but still the query is not partially matched. Longest Common Subsequence (LCS)

algorithm on the other hand can match the partial query with partial test sequences

since it looks for a Longest Common Subsequence which need not be end-to-end.

LCS by itself is not appropriate as it requires discrete symbols and does not account

for local similarity. A modified version of LCS known as Rough Longest Common

Subsequence takes continuous symbols and takes into account the local similarity

of the Longest Common Subsequence. The algorithm proposed in [30] to find

Rough Longest Common Subsequence between two sequences fits the bill for the

task of motif discovery. An example of RLCS algorithm matching two partial

phrases is shown in Figure 4.1. The two music segments are represented by their

tonic normalized pitch contours. The stationary points, where the first derivative

is zero, of the tonic normalized pitch contour are first determined. The points are

then interpolated using cubic Hermite interpolation to smooth the contour.

In Chapter 3, plenty of false alarms were observed. One of the most prevalent

false alarms was found to be, due to a sustained note appearing in the phrase. The

slope of the linear trend in stationary points along with its standard deviation is

used to address this issue.

The rest of the chapter is organized as follows. In Section 4.2 the use of com-

position lines to find motifs is discussed. Section 4.3 discusses the optimization

criteria to find the Rough Longest Common Subsequence. Section 4.4 describes

the proposed approach for discovering typical motifs of ragas. Section 4.5 describe

the dataset used in this work. Experiments and results are presented in Section

4.6.

43

4.2 Lines from the compositions

As previously mentioned, first line of the composition contains the characteristic

traits of a raga. The importance of the first lines and the raga information it holds is

illustrated in great detail in the T. M. Krishna’s book on Carnatic music [26]. T. M.

Krishna states that opening section called “pallavi” (discussed in Chapter 2) directs

the melodic flow of the raga and through its rendition, the texture of the raga can

be felt. Motivated by this observation, an attempt is made to verify the conjecture

that typical motifs of a raga can be obtained from the first lines of compositions.

Along with the lines from pallavi, we have also selected few lines from other

sections, namely, ‘anupallavi’ and ‘charanam’. Anupallavi comes after pallavi and the

melodic movements in this section tend to explore the raga in the higher reaches

of the octave as discussed in Chapter 2.

4.3 Optimization criteria to find Rough Longest Com-

mon Subsequence

The Rough Longest Common Subsequence (RLCS) between two sequences, X =

〈x1, x2, · · · , xn〉 and Y = 〈y1, y2, · · · , ym〉, of length n and m is defined as the Longest

Common Subsequence (LCS) ZXY = 〈(xi1 , y j1), (xi2 , y j2), · · · , (xip , y jp)〉, 1 ≤ i1 < i2 <

· · · < ip ≤ n, 1 ≤ j1 < j2 < · · · < jp ≤ m; such that the similarity between xik

and y jk is greater than a threshold, τsim, for k = 1, · · · , p. This definition does not

include any constraints on the length and on the local similarity of the RLCS. Some

applications demand the RLCS to be locally similar or its length to be in a specific

range. As discussed in Section 3.3, [30] uses constraints on local similarity and on

44

the length to obtain the RLCS. For the task of motif discovery along with these

constraints, one more constraint is used to reduce false alarms. Before discussing

any of these optimization measures in detail, a few quantities, from Section 3.3,

need to be redefined which gives them a different perspective.

lwSXY

=

s∑k=1

sim(xik , y jk) (4.1)

gX = is − i1 + 1 − s (4.2)

gY = js − j1 + 1 − s (4.3)

Let SXY = 〈(xi1 , y j1), (xi2 , y j2), · · · , (xis , y js)〉, 1 ≤ i1 < i2 < · · · < is ≤ n, 1 ≤ j1 < j2 <

· · · < js ≤ m; be a rough common subsequence (RCS) of length s and sim(xik , yik) ∈

[0, 1] be the similarity between xik and yik for k = 1, · · · , s. (4.1) defines the rough

length of SXY as sum of similarities. Thus, rough length is less than or equal to s.

The number of points in the shortest substring of sequence X, containing the RCS

SXY, that are not the part of the RCS SXY are termed as gaps in SXY with respect to

sequence X as defined by (4.2). Similarly, (4.3) defines the gaps in SXY with respect

to sequence Y. Small gaps indicate that the distribution of RCS is dense in that

sequence.

The optimization measures to find the RLCS are described as follows.

4.3.1 Density of the match

(4.4) represents the distribution of the RCS SXY in the sequences X and Y. This is

called density of match, δSXY . This quantity needs to be maximized to make sure

45

the subsequence, SXY, is locally similar. β ∈ [0, 1] weighs the individual densities

in sequences X and Y. The terms, lwSXY

+ gX and lwSXY

+ gY, in (4.4) are same as

rough-width across query and reference from Chapter 3.

δSXY = βlwSXY

lwSXY

+ gX+ (1 − β)

lwSXY

lwSXY

+ gY(4.4)

4.3.2 Normalized weighted length

The weighted length of RCS is normalized as shown in (4.5) to restrict its range to

[0, 1]. n and m are the lengths of sequences X and Y, respectively.

lwSXY

=lwSXY

min(m,n)(4.5)

4.3.3 Linear trend in stationary points

As discussed in Section 3.3, density of the match, δSXY , and normalized weighted

length, lwSXY

, are used as optimization measures to find the Rough Longest Common

Subsequence. A rough common subsequence, SXY, between two sequences, X

and Y, which maximizes these optimization measures is found using a dynamic

programming approach. In this work an additional measure, based on the linear

trend in stationary points, is used for optimization which helps in reducing a type

of false alarm.

As observed in Chapter 3, the RLCS obtained using only two optimization

measures, discussed in Section 3.3, suffered from a large number of false alarms

for the motif spotting task. The false alarms generally constituted of long and

sustained notes. This resulted in good normalised weighted lengths and density.

46

Figure 4.2: Slopes of the linear trend of stationary points helps in reducing the falsealarms. The last three phrases are false alarms.

To address this issue, the slope and standard deviation of the slope of the linear

trend in stationary points of a phrase are estimated. Figure 4.2 shows a set of

phrases. This set has five phrases which are termed as similar phrases based on the

density of match and normalized weighted length. The first two phrases, shown

in green, are true positives while the remaining, shown in red, are false alarms.

Figure 4.2 also shows the linear trend in stationary points for the corresponding

phrases. It is observed that the trends are similar for true positives when compared

to that of the false alarms. The slope of the linear trend for the fifth phrase (false

alarm) is similar to the true positives but its standard deviation is less. Therefore,

a combination of the slope and the standard deviation of the linear trend is used

47

to reduce the false alarms.

Let the stationary points in the shortest substrings of sequences X and Y con-

taining the RCS SXY be 〈xq1 , xq2 , · · · , xqtx〉 and 〈yr1 , yr2 , · · · , yrty

〉 respectively, where

tx and ty are the number of stationary points in the respective substrings. (4.6)

estimates the slope of the linear trend, of stationary points in the substring of se-

quence X, as the mean of the first difference of stationary points, which is same asxqtx−xq1

tx−1 [10]. Its standard deviation is estimated using (4.7). Similarly, µYSXY

and σYSXY

are also estimated for the substring of sequence Y.

µXSXY

=1

tx − 1

tx−1∑k=1

(xqk+1 − xqk) (4.6)

σXSXY

2=

1tx − 1

tx−1∑k=1

((xqk+1 − xqk) − µXSXY

)2 (4.7)

Let z1 = µXSXYσY

SXYand z2 = µY

SXYσX

SXY. For a true positive, the similarity in the

linear trend should be high. (4.8) calculates this similarity which needs to be

maximized. This similarity has negative value when the two slopes are of different

signs and thus, the penalization is more.

λSXY =

max(z1,z2)min(z1,z2) i f z1 < 0; z2 < 0

min(z1,z2)max(z1,z2) otherwise

(4.8)

Finally, (4.9) combines this optimization measure with the other two measures

to get a score value which is maximized. Then the RLCS, RXY, between the se-

quences X and Y, is defined as an RCS with a maximum score in (4.10). The RLCS

48

RXY can be obtained by optimizing (4.9) using a dynamic programming algorithm.

ScoreSXY = αδSXY lwSXY

+ (1 − α)λSXY (4.9)

RXY = argmaxSXY

(ScoreSXY) (4.10)

4.4 Discovering typical motifs of ragas

Typical motifs of a raga are discovered using composition lines in that raga and

composition lines from other ragas. For each voiced part in a composition line of

a raga, RLCS is found in the voiced parts of other composition lines of the same

raga. Only those RLCS are selected whose score values and lengths (in seconds) are

greater than thresholds τscr and τlen respectively. The voiced parts which generated

no RLCS are interpreted to have no motifs. The RLCS generated for a voiced part

are grouped and this group is interpreted as a motif found in that voiced part. This

results in a number of groups (motifs) for a raga. Further, filtering is performed to

isolate typical motifs of that raga.

4.4.1 Filtering to get typical motifs of a raga

The generated motifs are filtered to get typical motifs of a raga using composition

lines of other ragas. The most representative candidate of a motif, a candidate with

highest score value, is selected to represent that motif or group. The instances of

a motif are spotted in the composition lines of cohort ragas using motif spotting

discussed in Chapter 3. Each motif is considered as a query to be searched for in

a composition line. The RLCS is found between the query and a composition line.

49

From the several RLCS found from several composition lines of a raga, top τn RLCS

with highest score values are selected. The average of these score values defines

the presence of this motif in that raga. A motif of a raga is isolated as its typical

motif if the presence of this motif is more in the given raga than in other ragas. The

value of τn is selected empirically.

4.5 Dataset

Table 4.1: D1: Dataset of composition lines

Raga Number of Average TotalName recordings Duration (secs) Duration (mins)Bhairavi 17 16.87 4.78Kamboji 12 13.01 2.60Kalyani 9 12.76 1.91Shankarabharanam 12 12.55 2.51Varali 9 9.40 1.41Overall 59 13.44 13.22

Table 4.2: D1: Dataset for filtering

Raga Number of Average TotalName recordings Duration (mins) Duration (hrs)Bhairavi 20 18.88 6.29Kamboji 9 24.25 3.64Kalyani 16 20.07 5.35Shankarabharanam 10 21.68 3.61Varali 18 17.03 5.11Overall 73 19.73 24.01

The approach is tested on two datasets, namely: D1 and D2. D1 consists of five

ragas. The details of composition lines used in D1 for finding similar pattern and

for filtering are given in Table 4.1 and Table 4.2 respectively. The lines are sung

50

by a musician in isolation. This is done to ensure that the pitch estimation does

not get affected due to the accompanying instruments. D2 consists of 12 ragas and

composition lines are extracted from live concert recordings using the algorithm

described in [42]. The details of the dataset are given in Table 4.3 and Table 4.4.

Table 4.3: D2: Dataset of composition lines

Raga Number of Average TotalName recordings Duration (secs) Duration (mins)Huseni 6 9.20 0.92Bhairavi 20 12.03 4.01Abhogi 13 8.66 1.88Kalyani 16 12.36 3.30Ananda-Bhairavi 14 8.16 1.90Shri 10 13.67 136.67Bageshri 4 15.09 1.01Hameer-Kalyani 5 11.45 0.95Surati 13 9.33 2.02Thodi 29 10.99 5.31Mukhari 9 10.01 1.50Kapi 14 9.31 2.17Overall 153 10.69 27.25

Table 4.4: D2: Dataset for filtering

Raga Number of Average TotalName recordings Duration (secs) Duration (mins)Huseni 24 9.21 3.68Bhairavi 74 11.44 14.11Abhogi 48 8.30 6.64Kalyani 57 11.49 10.91Ananda-Bhairavi 55 8.08 7.41Shri 39 13.74 8.93Bageshri 16 15.09 4.02Hameer-Kalyani 20 11.45 3.81Surati 52 9.33 8.08Thodi 113 10.98 20.68Mukhari 36 10.01 6.00Kapi 56 9.31 8.69Overall 590 10.47 102.99

51

Table 4.5: D1:Similar motifs retrieved from composition lines

Raga True Partial Wrong AverageName Motifs Motifs Motifs Duration (secs)Bhairavi 3 2 5 3.52Kamboji 1 2 2 3.40Kalyani 2 1 3 4.48Shankarabharanam 2 2 1 3.41Varali 1 0 1 4.11Overall 9 7 12 3.78

4.6 Experiments and results

The pitch of the music segment, with tonic normalization, is used as a basic feature

in this work. The extraction of pitch is discussed in Chapter 3.

The similarity, sim(xik , y jk), between two symbols xik and y jk is defined as

sim(xi, y j) =

1 −

|xi−y j|3

(3st)3 i f | xi − y j |< 3st

0 otherwise(4.11)

where st represents a semitone in cents. The intuition behind using this function

for similarity is same as of (3.11). The similarity threshold, τsim, is empirically set

to 0.45 which accepts similarities when two symbols are less than 2.5 semitones

(approx.) apart, although penalty is high after a semitone. The threshold on the

score of RLCS, τscr, is empirically set to 0.6 to accept RLCS with higher score values.

The threshold on the length of the RLCS, τlen, is set to 2 seconds to get longer motifs.

The value of β is set to 0.5 to give equal importance to the individual densities in

both the sequences and α value is set to 0.6 which gives more importance to density

of match and normalized weighted length as compared to linear trend in stationary

points. τn is empirically set to 3.

52

Table 4.6: D1:Percentage of motifs preserved after filtering

Raga True Partial Wrong AverageName Motifs(%) Motifs(%) Motifs(%) Duration (secs)Bhairavi 100.00 100.00 0.00 4.52Kamboji 0.00 0.00 0.00 NAKalyani 0.00 0.00 0.00 NAShankarabharanam 100.00 100.00 0.00 3.68Varali 100.00 NA 0.00 6.09Overall 66.67 57.14 0.00 4.76

Table 4.7: D2:Similar motifs retrieved from composition lines

Raga True Partial Wrong AverageName Motifs Motifs Motifs Duration (secs)Huseni 1 4 0 4.84Bhairavi 8 6 9 4.60Abhogi 5 3 3 3.68Kalyani 15 13 3 3.87Ananda-Bhairavi 12 2 6 3.85Shri 1 16 1 3.87Bageshri 1 1 0 3.48Hameer-Kalyani 2 0 3 3.96Surati 21 15 13 3.96Thodi 32 18 30 4.17Mukhari 3 1 12 3.89Kapi 3 4 1 3.48Overall 104 83 81 4.04

53

Table 4.8: D2:Percentage of motifs preserved after filtering

Raga True Partial Wrong AverageName Motifs(%) Motifs(%) Motifs(%) Duration (secs)Huseni 100.00 25.00 NA 5.66Bhairavi 87.50 66.67 33.33 5.00Abhogi 100.00 66.67 33.33 3.62Kalyani 80.00 61.54 66.67 4.00Ananda-Bhairavi 58.33 100.00 83.33 3.90Shri 100.00 43.75 0.00 3.98Bageshri 100.00 100.00 NA 3.48Hameer-Kalyani 50.00 NA 0.00 3.23Surati 85.71 66.67 84.62 4.12Thodi 71.88 55.56 40.00 4.50Mukhari 33.33 0.00 41.67 4.15Kapi 100.00 25.00 0.00 3.23Overall 76.92 55.42 48.15 4.22

The similar patterns found across composition lines of a raga are summarized

in Table 4.5 and Table 4.7 for D1 and D2, respectively. These similar phrases are

categorized as true motifs, partial motifs and wrong motifs by as musician. Partial

motifs with some more context can be considered as true motifs. The task of

filtering is thus to preserve as many true motifs as possible and reject as many

wrong motifs as possible. The details of the typical motifs preserved after filtering

are given in Table 4.6 and Table 4.8. For D1, the appropriate chosen threshold

filtered out all the motifs of Kamboji and Kalyani. This happened because other

higher thresholds resulted in preserving many wrong motifs. A better analysis of

filtering is done for D2 as shown in Table 4.8. On an average 77% of true motifs are

preserved whereas only 48% of wrong motifs are preserved. The average length of

all the typical motifs is similar to the longer motifs used in Chapter 3. The shorter

motifs used in Chapter 3 also resulted in great deal of false alarms. The longer

motifs, of approximately 3 seconds duration, used in Chapter 3 were inspired from

54

the raga test conducted by Rama Verma2.

4.7 Summary

In this chapter, we have presented an approach to discover typical motifs of a

raga from the composition lines in that raga. The importance of compositions lines

is discussed in detail. A new measure is introduced, to reduce the false alarms,

in the optimization criteria for finding Rough Longest Common Subsequence

between two given sequences. Using the RLCS algorithm, similar patterns across

composition lines of a raga are found. Further, the typical motifs are isolated by

a filtering technique, introduced in this work, which uses composition lines of

various ragas. The filtering results in preserving most of the true motifs.

2http://www.youtube.com/watch?v=3nRtz9EBfeY

55

http://www.youtube.com/watch?v=3nRtz9EBfeY

CHAPTER 5

Raga Verification

5.1 Introduction

Raga identification by machine is a difficult task in Carnatic music. This is primarily

because a raga is not defined just by the solfege but by svaras (ornamented notes)

[27]. The melodic histograms obtained for the Carnatic music are more or less

continuous owing to the gamaka laden svaras of the raga [44]. As discussed in

Chapter 2, the svaras in Carnatic music are not quantifiable, but for notational

purposes an octave is divided into 12 semitones: S, R1, R2(G1), R3(G2), G3, M1,

M2, P, D1, D2(N1), D3(N2) and N3. Each raga is characterised by atleast four or five

svaras. Arohana and avarohana correspond to an ordering of svaras in the ascent and

descent of the raga, respectively. Ragas with linear ordering of svaras are referred

to as linear ragas such as Mohonam raga (S R2 G3 P D2 S). Similarly, non linear ragas

have non linear ordering such as Ananda Bhairavi raga (S G2 R2 G2 M1 P D2 P S).

A further complication arises owing to the fact that although the svaras in different

ragas may be identical, the ordering can be different. Even if the ordering is the

same, in one raga the approach to the svara can be different, for example, Thodi and

Dhanyasi [27].

In this chapter, this problem is addressed in a different way. The objective

is to mimic a listener in a Carnatic music concert. There are at least 100 ragas

that are actively performed today. Most listeners identify ragas by referring to the

compositions with similar motivic patterns that they might have heard before. In

raga verification, a raga’s name (claim) and an audio clip is supplied. The machine

has to primarily verify whether the clip belongs to a given raga or not.

This task therefore requires the definition of cohorts for a raga. As discussed

in Chapter 4, cohorts of a given raga are the ragas which have similar movements

while at the same time have subtle differences, for example, darbar and nayaki. In

darbar raga, G2 is repeated twice in avarohana. The first is more or less flat and short,

while the second repetition is inflected. The G2 in nayaki is characterised by a very

typical gamaka. In order to verify whether a given audio clip belongs to a claimed

raga, the similarity is measured with respect to the claimed raga and compared with

its cohorts using a novel algorithm called Longest Common Segment Set (LCSS).

LCSS scores are then normalized using Z and T norms [1, 34].

The rest of the chapter is organised as follows. Section 5.2 describes the dataset

used in the study. Section 5.3 describes the LCSS algorithm and its relevance for

raga verification. As the task is raga verification, score normalisation is crucial.

Different score normalisation techniques are discussed in Section 5.4. The experi-

mental results are presented in Section 5.5 and discussed in Section 5.6. The main

conclusions drawn from the key results in this work are discussed in Section 5.7

5.2 Dataset used

Table 5.1 gives the details of the dataset used in this work. This dataset is obtained

from the Charsur arts foundation1. The dataset consists of 254 vocal and instru-

mental live recordings spread across 30 ragas, including both target ragas and their

cohorts. For every new raga that needs to be verified, templates for the raga and

1http://www.charsurartsfoundation.org

57

Table 5.1: Details of the database used. Durations are given in approximate hours(h), minutes (m) or seconds (s).

Vocal Instruments TotalMale Female Violin Veena Saxophone FluteNumber of Ragas 25 27 8 3 2 2 30 (distinct)Number of Artists 53 37 8 3 1 3 105Number of Recordings 134 97 14 4 2 3 254Total Duration of Recordings 30 h 22 h 3 h 31 m 10 m 58 m 57 hNumber of Pallavi Lines 655 475 69 20 10 15 1244Average Duration of Pallavi Lines 11 s 8 s 10 s 6 s 6 s 8 s 8 s (avg.)Total Duration of Pallavi Lines 2 h 1 h 11 m 2 m 55 s 2 m 3 h

its cohorts are required.

5.2.1 Extraction of pallavi lines

A composition in Carnatic music is composed of three parts, namely, pallavi, anu-

pallavi and charanam. It is believed that the first phrase of the pallavi line of a

composition contains the important movements in a raga. A basic sketch is initi-

ated in the pallavi line, developed further in the anupallavi and charanam[42] and

therefore contains the gist of the raga. The algorithm described in [42] is used for

extracting pallavi lines from compositions. Details of the extracted pallavi lines are

given in Table 5.1. Experiments are performed on template and test recordings,

selected from these pallavi lines, as discussed in greater detail in Section 5.5.

5.2.2 Selection of cohorts

Wherever possible 4-5 ragas are chosen as cohorts of every raga. The cohorts

of every raga were defined by a professional musician. Professionals are very

careful about this as they need to ensure that during improvisation, they do not

accidentally sketch the cohort. Interestingly, as indicated by the musicians, cohorts

58

need not be symmetric. A raga A can be similar in movement to a raga B, but raga

B need not share the same commonality with raga A. The identity of raga B

may depend on phrases similar to raga A with some additional movement. For

example, to identify the raga Hindolam, the phrase G2 M1 D1 N2 S is adequate,

while Jayantashree raga requires the phrase G2 M1 D1 N2 S N2 D1 P M1 G2 S.

5.3 Longest Common Segment Set Algorithm

In raga verification, matching needs to be performed between two audio clips. The

number of similar portions could be more than one and spread across the entire

clip. Therefore, there is a need for a matching approach that can find these similar

portions without issuing large penalties for gaps in between them. In this section, a

novel algorithm called Longest Common Segment Set is described which attempts

to do the same.

Let X = 〈x1, · · · , xm; xi ∈ R; i = 1 · · ·m〉 be a sequence of m symbols and Y =

〈y1, · · · , xn; y j ∈ R; j = 1 · · · n〉 be a sequence of n symbols where xi and y j are the

tonic normalized pitch values in cents. The similarity between two pitch values,

xi and y j, is calculated using (4.11) defined in Chapter 4.

A common subsequence SXY in sequences X and Y is defined as

SXY =

⟨(xi1 , y j1), · · · , (xip , y jp)

⟩1 ≤ i1 < · · · < ip ≤ m

1 ≤ j1 < · · · < jp ≤ n

simk=1,··· ,p

(xik , y jk) ≥ τsim

(5.1)

59

Soft segment running score

10

20

30

40

50

60

70

80

90

100

110

Common subsequence

Hard segments

Soft segments

450 500 550 600 650 700 750 800

Sequence 2

0

200

400

600

800

1,000

1,200

Sequence 1

DDDD

200300400200

300

400

Pitch

Pitch

Figure 5.1: An example of a common segment set between two sequences repre-senting the real data

where τsim is a threshold which decides the membership of the symbol pair (xik , y jk)

in a subsequence SXY. The value of τsim is decided empirically based on the domain

of the problem as discussed in Section 5.5. An example common subsequence is

shown with red color in Figure 5.1.

5.3.1 Common segments

Continuous symbol pairs in a common subsequence are referred to as a segment.

Two different types of segments are defined, namely hard and soft segments.

60

Hard segment is a group of common subsequence symbols such that there are

no gaps in between as shown in green color in Figure 5.1. Then a hard segment,

starting with a symbol pair (xi, y j), must be of the form

HlXiY j

=

⟨(xi, y j), (xi+1, y j+1), · · · , (xi+l, y j+l)

⟩1 ≤ i < i + 1 < · · · < i + l ≤ m

1 ≤ j < j + 1 < · · · < j + l ≤ n

(5.2)

where l + 1 represents the length of the hard segment. The score of the kth hard

segment HlXik Y jk

is defined as

hc(Hl

Xik Y jk

)=

l∑d=0

sim(xik+d, y jk+d

)(5.3)

Soft segment is a group of common subsequence symbols where gaps are

permitted with a penalty. Therefore, a soft segment consists of one or more hard

segments (shown with blue color in Figure 5.1). The gaps between the hard

segments decides the penalty assigned. Thus, the score of the kth soft segment

SXik Y jk, consisting of r hard segments, is defined as

sc(SXik Y jk

)=

r∑s=1

hc(Hl

Xik Y jk

)− γη (5.4)

where γ is the total number of gaps between r hard segments and η is the penalty

for each gap. The number of hard segments to be included in a soft segment is

decided by the running score of the soft segment. The running score of the soft

segment increases during the hard segment and decreases during the gap due to

penalties as shown in gray-scale in Figure 5.1. During a gap, if the running score

61

decreases below a threshold τrc (or becomes almost white in Figure 5.1) then that

gap is ignored and all the hard segments, encountered before it, are included into

a soft segment.

5.3.2 Common segment set

All segments together correspond to a segment set. The score of a segment set (ss)

is defined as

score (ssXY) =

∑pk=1 c

(ZXik Y jk

)2

min(m,n)2 (5.5)

where p is the number of segments, c refers to the score computed in either (5.3)

or (5.4) and Z refers to a segment (hard or soft). This equation gives preference

to longer segments. For example, in case 1, there are 10 segments each of length

2 and in case 2, there are 4 segments each of length 5. In both the cases the total

length of the segments is 20 but in (5.5), case 1 is scored as 0.1 and case 2 is scored

as 0.25 when the denominator is taken to be 202. Longer matched segments could

be considered as a phrase or an essential part of it. Whereas, shorter matched

segments could generally mean noise. Therefore, there is a heavier penalty for

shorter segments.

5.3.3 Longest Common Segment Set

Longest Common Segment Set (LCSS) is a segment set with maximum score value

as defined in (5.6).

lcssXY = argmaxssXY

(score (ssXY)) (5.6)

62

Therefore, LCSS can be obtained by maximizing score in (5.5) using dynamic

programming.

Dynamic Programming algorithm to find Longest Common Segment Set

The algorithm to find the optimum soft segment set is given in Algorithm 1.

Optimum hard segment sets are found similarly. In the algorithm, tables c and

s are used for storing the running score and the score of the common segment

sets, respectively. Table a is used for storing the partial scores from s. Table d

is maintained for backtracking the path of the LCSS. The arrows represent the

subpath to take while backtracking (up, left or cross). Input sequences to function

LCSS are appended with symbols φx and φy such that their similarity with any

symbol is 0. This is mainly required to compute the last row and column of score

table. Line 8 in Algorithm 1 updates the running score with a value based on

the similarity, whereas line 9 updates the score using the previous diagonal entry.

When symbols are dissimilar a gap is found. Lines 12 and 19 are used to penalize

the running score. If it is an end of the segment then line 14 and 21 updates

score as per (5.5). Line 26 updates table a with the score value of the current

segment set when the beginning of a new segment is encountered. When a gap is

encountered line 28 updates table a to −1. To find the Longest Common Segment

Set, backtracking is performed to obtain the path in table d that has the maximum

score as given by table s. The boundaries of soft segments can be found using the

cost values while tracing the path.

63

Algorithm 1 Algorithm for Soft-Longest Common Segment SetData:c - table of size (m + 2) × (n + 2) for storing running scores - table of size (m + 2) × (n + 2) for storing scored - table of size (m + 2) × (n + 2) for path trackinga - table of size (m + 2) × (n + 2) for storing partial scores.

1: function LCSS(〈x1, · · · , xm, φx〉, 〈y1, · · · , yn, φy〉

)2: Initialize 1st row and column of c, s, d and a to 03: p← min(m,n)4: for i← 1 to m + 1 do5: for j← 1 to n + 1 do6: if sim(xi, y j) > τsim then7: di, j ← “↖ ”8: ci, j ← ci−1, j−1 +

( sim(xi, y j)−τsim

1−τsim

)9: si, j ← si−1, j−1

10: else if ci−1, j > ci, j−1 then11: di, j ← “ ↑ ”12: ci, j ← max(ci−1, j − ρ, 0)13: if di−1, j = “↖ ” then

14: si, j ←ai−1, j∗p2+c2

i−1, j

p2

15: else16: si, j ← si−1, j

17: else18: di, j ← “← ”19: ci, j ← max(ci, j−1 − ρ, 0)20: if di, j−1 = “↖ ” then

21: si, j ←ai, j−1∗p2+c2

i, j−1

p2

22: else23: si, j ← si, j−1

24: q← max(ai−1, j−1, ai−1, j, ai, j−1)25: if q = −1 and di, j = “↖′′ then26: ai, j ← si−1, j−1

27: else if ci, j < τrc then28: ai, j ← −129: else30: ai, j ← q

5.4 Raga Verification

Let Traga ={t1, t2, · · · , tNraga

}represent a set of template recordings, where ‘raga’

refers to the name of the raga and Nraga is the total number of templates for that

64

raga. During testing, an input test recording, X, with a claim is tested against all

the template recordings of the claimed raga. The final score is computed as given

in (5.7).

score (X, claim) = maxY∈Tclaim

(score (lcssXY)) (5.7)

The final decision, of accepting or rejecting the claim, directly based on this score

could be erroneous. Score normalisation with cohorts is essential to make a deci-

sion, especially when the difference between two ragas is subtle.

5.4.1 Score Normalization

LCSS scores corresponding to correct and incorrect claims are referred as true and

imposter scores, respectively. If the imposter is a cohort raga, then the imposter

score is also referred as cohort score. Various score normalization techniques are

discussed in the literature for speech recognition, speaker/language verification

and spoken term detection [1, 34].

Zero normalization (Z-norm) uses the mean and variance estimate of cohort

scores for scaling. The advantage of Z-norm is that the normalization parameters

can be estimated off-line. Template recordings of a raga are tested against template

recordings of its cohorts and the resulting scores are used to estimate a raga specific

mean and variance for the imposter distribution. The normalized scores using Z-

norm can be calculated as

scorenorm

(X, claim) =score (X, claim) − µclaim

I

σclaimI

(5.8)

65

where µclaimI and σclaim

I are the estimated imposter parameters for the claimed raga.

Test normalization (T-norm) is also based on a mean and variance estimation

of cohort scores for scaling. The normalization parameters in T-norm are estimated

online as compared to their offline estimation in Z-norm. During testing, a test

recording is tested against template recordings of cohort ragas and the resulting

scores are used to estimate mean and variance parameters. These parameters are

then used to perform the normalization given by (5.8).

The test recordings of a raga may be scored differently against templates corre-

sponding to the same raga or imposter raga. This can cause overlap between the

true and imposter score distributions. T-norm attempts to reduce this overlap.

The templates that are stored and the audio clip that is used during test can be

from different environments.

5.5 Performance evaluation

In this section, we describe the results of raga verification using LCSS algorithm

in comparison with Rough Longest Common Subsequence (RLCS) algorithm [30]

and Dynamic Time Warping (DTW) algorithm using different normalizations.

5.5.1 Experimental configuration

Only 17 ragas out of 30 were used for raga verification as only for 17 ragas sufficient

number of relevant cohorts could be obtained from the 30 ragas. This is due to non-

symmetric nature of the cohorts as discussed in Section 5.2. For raga verification,

40% of the pallavi lines are used as templates and remaining 60% are used for

66

Table 5.2: EER(%) for different algorithms using different normalizations on dif-ferent datasets.

Algorithm Dataset No Norm Z-norm T-NormDTW D1 27.78 29.88 17.45

D2 40.81 40.03 35.96RLCS D1 24.43 27.22 14.87

D2 41.72 42.58 41.20RLCS-MOD D1 20.88 22.72 13.25

D2 36.72 37.68 34.58LCSS (hard) D1 29.00 31.75 15.65

D2 40.28 40.99 34.11LCSS (soft) D1 21.89 24.11 12.01

D2 37.24 38.96 34.57

testing. This partitioning of dataset is done into two ways, referred as D1 and D2.

In D1, the variations of a pallavi line might fall into both templates and test though

it is not necessary. Variations of a pallavi line are different from the pallavi line due

to improvisations. In D2, these variations can either belong to template or they all

belong to test but strictly not present in both. The values of thresholds τsim and τrc

are empirically chosen as 0.45 and 0.5, respectively. Penalty, η, issued for gaps in

segments is empirically chosen as 0.5.

5.5.2 Results

Table 5.2 and Figure 5.2 show the comparison of LCSS with DTW and RLCS using

different normalizations. Equal Error rate (EER) refers to a point where false alarm

rate and miss rate is equal. For T-norm, the best 20 cohort scores were used for

normalization. LCSS (soft) with T-norm performs best for D1 around the EER

point, and for high miss rates and low false alarms, whereas it performs poorer

than LCSS (hard) for low miss rates and high false alarms. This behavior appears

to be reversed for D2. The magnitude around EER is much greater for D2. This

67

0.001 0.01 0.1 0.2 0.5 1 2 5 10 20 40

0.1

0.2

0.5

1

2

5

10

20

40

80

False Alarm probability (in %)

Mis

s p

rob

ab

ility

(in

%)

no−norm lcss (soft)

z−norm lcss (soft)

t−norm lcss (soft)

no−norm lcss (hard)

z−norm lcss (hard)

t−norm lcss (hard)

no−norm rlcs

z−norm rlcs

t−norm rlcs

no−norm dtw

z−norm dtw

t−norm dtw

0.001 0.01 0.1 0.2 0.5 1 2 5 10 20 40

0.1

0.2

0.5

1

2

5

10

20

40

80

False Alarm probability (in %)

Mis

s p

rob

ab

ility

(in

%)

no−norm lcss (soft)

z−norm lcss (soft)

t−norm lcss (soft)

no−norm lcss (hard)

z−norm lcss (hard)

t−norm lcss (hard)

no−norm rlcs

z−norm rlcs

t−norm rlcs

no−norm dtw

z−norm dtw

t−norm dtw

a) DET curves for dataset D1 b) DET curves for dataset D2

Figure 5.2: DET curves comparing LCSS algorithm with different algorithms usingdifferent score normalizations

is because, none of the variations of the pallavi lines in test are present in the

templates. It is also shown that RLCS performs poorer than any other algorithms

for D2. The curves also show no improvements for Z-norm compared to baseline

with no normalization. This can happen due to the way normalization parameters

are estimated for Z-norm. For example, some of the templates, which may not be

similar to the test, can be similar to some of the cohorts’ templates, resulting in

higher mean. This would not have happened in T-norm where the test itself is

tested against the cohorts’ templates.

5.6 Discussion

In this section, we discuss how LCSS (hard) and LCSS (soft) can be combined

to achieve better performance. We also verify that T-norm reduces the overlap

between true and imposter scores.

68

Table 5.3: Number of claims correctly verified by hard-LCSS only, by soft-LCSSonly, by both and by neither of them for D1 and D2 using T-norm

Dataset Claim- Hard- Soft- Both Neithertype only only

D1 True 23 55 289 77False 46 78 1745 54

D2 True 47 23 155 220False 99 75 1585 168

5.6.1 Combining hard-LCSS and soft-LCSS

Instead of selecting a threshold, we will assume that a true claim is correctly

verified when its score is greater than all the cohort scores. Similarly, a false claim

is correctly verified when its score is lesser than atleast one of the cohort scores.

Table 5.3 shows the number of claims correctly verified only by hard-LCSS, only

by soft-LCSS, by both and by neither of them. It is clear that there is an overlap

between the correctly verified claims of hard-LCSS and soft-LCSS. Nonetheless,

the number of claims distinctly verified by both is also significant. Therefore, the

combination of these two algorithms could result in a better performance.

5.6.2 Reduction of overlap in score distribution by T-norm

Figure 5.3 shows the effect of T-norm on the distribution of hard-LCSS scores. It is

clearly seen that the overlap, between the true and imposter score distributions, is

reduced significantly. For visualization purposes, the true score distributions are

scaled to zero mean and unit variance and corresponding imposter score distribu-

tions are scaled appropriately.

69

−5 −4 −3 −2 −1 0 1 2 3 4 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

LCSS (hard) scores without normalization

Den

sity

True scores

Imposter scores

−5 −4 −3 −2 −1 0 1 2 3 4 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

LCSS (hard) scores with t−norm

Den

sity

True scores

Imposter scores

Figure 5.3: Showing the effect of T-norm on the score distribution

5.6.3 Scalability of raga verification

The verification of a raga depends on the number of its cohort ragas which are

usually 4 or 5. Since it does not depend on all the ragas in the dataset, as in raga

identification, any number of ragas can be added to the dataset.

5.7 Summary

In this Chapter, we have presented a different approach to raga analysis in Carnatic

music. Instead of raga identification, raga verification is performed. A set of cohorts

for every raga is defined. The identity of an audio clip is presented with a claim.

The claimed raga is verified by comparing with the templates of the claimed raga

and its cohorts by using a novel approach, Longest Common Segment Set (LCSS).

A set of 17 ragas and its cohorts constituting 30 ragas is tested using appropriate

score normalization techniques. An equal error rate of about 12% is achieved. This

approach is scalable to any number of ragas as the given raga and only its cohorts

need to be added to the system.

70

CHAPTER 6

Conclusion

Typical motifs of a raga are used to establish its identity in all improvisational and

compositional forms. Along with raga identity, typical motifs can also be used to

index a recording for archival purposes. Further, indexed motifs can also be used

to explore and to analyze the melodic phrases connecting them, which could be

useful for both listeners and learners of Carnatic music.

The objective of this thesis was to develop algorithmic techniques for automatic

extraction of typical motifs and for performing raga verification using the regions

replete with typical motifs. Some of the salient points presented in this thesis are

as follows:

6.1 Salient Points

• It was shown using pitch histograms that the notes in Carnatic music havegreater pitch range as compared to Hindustani music and Western classicalmusic. This renders the symbolic representation of Carnatic music a non-trivial task and poses significant challenges in the analysis of Carnatic music.

• The stationary points of the pitch contour were shown to preserve the es-sential raga information however, the exact melodic information was lost.For the task of finding different renditions of typical motifs, these stationarypoints were used to reduce the search space. A measure based on the slopeof the linear trend in stationary points along with its standard deviation isused to reduce the false alarms.

• An algorithm was proposed for time-series matching which is a modificationof an existing algorithm known as Rough Longest Common Subsequence.This algorithm can match shorter sequences that are common between twolonger sequences. However, the score was penalized with respect to the

length of the longer sequences. Therefore, matched shorter sequences canget low scores suggesting that the match is poor even when the match isgood.

• The second algorithm, known as Longest Common Segment Set, proposedfor time-series matching was novel. It can also match shorter sequences thatare common between two longer sequences but the score was not penalizedwith respect to the longer sequences. Therefore, it was more effective in theextraction of the common shorter sequences.

• Typical motifs of duration of approximately four seconds were found to bemore relevant for raga identity. Shorter motifs had less context and resultedin great deal of false alarms.

• Typical motifs were found to be prevalent in the pallavi lines of the composi-tions. Therefore, these pallavi lines were used in the task of raga verification.

• In raga verification, cohort ragas (usually four or five ragas) were used fornormalizing the score instead of all the ragas in the dataset. Therefore, theproposed raga verification system was found to be scalable to any number ofragas. For a new raga to be added into the system, only the templates of thenew raga and its cohorts were required without altering the existing system.

6.2 Criticism of the work

In this section, we discuss the shortcomings of the approaches proposed in this

thesis.

• The proposed algorithms for time-series matching requires that the orderingof the common shorter sequences is same in both the longer sequences. Ifthe ordering is different, all the common shorter sequences are not matched.

• The algorithms also fail to match sequences if they are in different octaves.

• The performance of the algorithms is also sensitive to pitch errors. Thisproblem is dealt to some extent by smoothing the pitch contours if the pitcherrors are not significantly large.

• Typical motifs are retrieved only if they repeat across the composition lines.Therefore, this approach relies on large number of composition lines.

• Raga verification also needs large number of composition lines (templates)such that most of the typical motifs are represented.

72

6.3 Future work

Given the drawbacks of the proposed approaches in the previous section, the

following improvements can be made:

• For time-series matching, when the ordering of common shorter sequencesis different, no single alignment can align all the common shorter sequences.In such situations different alignments can be inspected to extract all thecommon shorter sequences irrespective of their order.

• For matching sequences that belong to different octaves, one of the two se-quences can be shifted to different octaves and the matching can be performedwith all the shifted sequences.

• Instead of using pitch to represent the melody, a transformation of the fre-quency spectrum can be used that reduces other noises and preserves themelody. This will help in improving the performance of the algorithm whichis sensitive to pitch errors.

73

LIST OF PAPERS BASED ON THESIS

1. Shrey Dutta, Krishnaraj Sekhar PV and Hema A. Murthy. Raga Verificationin Carnatic Music using Longest Common Segment Set. In Proceedings of16th International Society for Music Information Retrieval Conference, 2015.

2. Shrey Dutta and Hema A. Murthy. Discovering Typical Motifs of a Raga fromOne-Liners of Songs in Carnatic Music. In Proceedings of 15th InternationalSociety for Music Information Retrieval Conference, pages 397–402, 2014.

3. Shrey Dutta and Hema A. Murthy. A modified rough longest commonsubsequence algorithm for motif spotting in an Alapana of Carnatic Music.In 20th National Conference on Communications (NCC), pages 1–6, 2014.

4. Vignesh Ishwar, Shrey Dutta, Ashwin Bellur and Hema A. Murthy. MotifSpotting in an Alapana in Carnatic Music. In Proceedings of 14th InternationalSociety for Music Information Retrieval Conference, pages 499–504, 2013

74

REFERENCES

[1] Roland Auckenthaler, Michael Carey, and Harvey Lloyd-Thomas. Score normal-ization for text-independent speaker verification systems. Digital Signal Processing,10:42–54, 2000.

[2] Ashwin Bellur and Hema A Murthy. A cepstrum based approach for identifying tonicpitch in indian classical music. In National Conference on Communications, pages 1–5,2013.

[3] Yueguo Chen, Mario A. Nascimento, Beng Chin, Ooi Anthony, and K. H. Tung. Spade:On shape-based pattern detection in streaming time series. In International Conferenceon Data Engineering, pages 786–795, 2007.

[4] Bill Chiu, Eamonn Keogh, and Stefano Lonardi. Probabilistic discovery of time seriesmotifs. In Proceedings of the Ninth ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining, pages 493–498, 2003.

[5] P Chordia and A Rae. Raag recognition using pitch- class and pitch-class dyaddistributions. In Proceedings of International Society for Music Information RetrievalConference, ISMIR, pages 431–436, 2007.

[6] Tom Collins, Andreas Arzt, Sebastian Flossmann, and Gerhard Widmer. Siarct-cfp:Improving precision and the discovery of inexact musical patterns in point-set repre-sentations. In Internation Society for Music Information Retrieval, pages 549–554, 2013.

[7] Tom Collins, Jeremy Thurlow, Robin Laney, Alistair Willis, and Paul H. Garthwaite.A comparative evaluation of algorithms for discovering translational patterns inbaroque keyboard works. In International Society for Music Information Retrieval, pages3–8, 2010.

[8] Darrell Conklin. Discovery of distinctive patterns in music. Intelligent Data Analysis,pages 547–554, 2010.

[9] Darrell Conklin. Distinctive patterns in the first movement of brahms string quartetin c minor. Journal of Mathematics and Music, 4(2):85–92, 2010.

[10] Jonathan D. Cryer and Kung-Sik Chan. Time Series Analysis: with Applications in R.Springer, 2008.

[11] Pranay Dighe, Parul Agarwal, Harish Karnick, Siddartha Thota, and Bhiksha Raj.Scale independent raga identification using chromagram patterns and swara basedfeatures. In 2013 IEEE International Conference on Multimedia and Expo Workshops, SanJose, CA, USA, July 15-19, 2013, pages 1–4, 2013.

[12] Pranay Dighe, Harish Karnick, and Bhiksha Raj. Swara histogram based structuralanalysis and identification of indian classical ragas. In Proceedings of the 14th Inter-national Society for Music Information Retrieval Conference, ISMIR 2013, Curitiba, Brazil,November 4-8, 2013, pages 35–40, 2013.

75

[13] Subbarama Dikshitulu. Sangita sampradaya pradarsini. The Music Academy Madras,Vol. 2, 2011.

[14] D.P.W. Ellis and G.E. Poliner. Identifying ‘cover songs’ with chroma features anddynamic programming beat tracking. In Proceedings of IEEE International Conferenceon Acoustics, Speech and Signal Processing, volume 4, pages 1429–1432, 2007.

[15] F. N. Fritsch and R. E. Carlson. Monotone Piecewise Cubic Interpolation. SIAM Journalon Numerical Analysis, Vol. 17, No. 2., 1980.

[16] Toni Giorgino. Computing and visualizing dynamic time warping alignments in R:The dtw package. Journal of Statistical Software, 31(7):1–24, 2009.

[17] AnYuan Guo and Hava Siegelmann. Time-warped longest common subsequencealgorithm for music retrieval. In Proceedings of 5th International Conference on MusicInformation Retrieval (ISMIR), 2004. http://works.bepress.com/hava_siegelmann/13.

[18] S Arthi H G Ranjani and T V Sreenivas. Shadja, swara identification and raga veri-fication in alapana using stochastic models. In 2011 IEEE Workshop on Applications ofSignal Processing to Audio and Acoustics (WASPAA), pages 29–32, 2011.

[19] F Scholer I S H Suyoto, A L Uitdenbogerd. Searching musical audio using symbolicqueries audio, speech, and language processing. IEEE Transactions on In Audio, Speech,and Language Processing, IEEE Transactions on, Vol. 16, No. 2., pages 372–381, 2008.

[20] Vignesh Ishwar, Ashwin Bellur, and Hema A Murthy. Motivic analysis and its rel-evance to raga identification in carnatic music. In Workshop on Computer Music,Instanbul, Turkey, July 2012. http://compmusic.upf.edu/publications.

[21] M Miron J Serra, G K Koduri and X Serra. Tuning of sung indian classical music. InProceedings of the 12th International Society for Music Information Retrieval Conference,ISMIR, pages 157–162, 2011.

[22] Berit Janssen, W. Bas de Haas, Anja Volk, and Peter van Kranenburg. Discoveringrepeated patterns in music: state of knowledge, challenges, perspectives. InternationalSymposium on Computer Music Modeling and Retrieval (CMMR), pages 225–240, 2013.

[23] Gopala Krishna Koduri, Sankalp Gulati, and Preeti Rao. A survey of raaga recognitiontechniques and improvements to the state-of-the-art. Sound and Music Computing,2011.

[24] Gopala Krishna Koduri, Sankalp Gulati, Preeti Rao, and Xavier Serra. Raga recogni-tion based on pitch distribution methods. Journal of New Music Research, 41(4):337–350,2012.

[25] A.S. Krishna, P.V. Rajkumar, K.P. Saishankar, and M. John. Identification of carnaticraagas using hidden markov models. In Applied Machine Intelligence and Informatics(SAMI), 2011 IEEE 9th International Symposium on, pages 107 –110, jan. 2011.

[26] T. M. Krishna. A Southern Music: The Karnatic Story, chapter 5. HarperCollins, India,2013.

[27] T M Krishna and Vignesh Ishwar. Carnatic music : Svara, gamaka, motif andraga identity. In Workshop on Computer Music, Instanbul, Turkey, July 2012. http://compmusic.upf.edu/publications.

76

http://works.bepress.com/hava_siegelmann/13

http://works.bepress.com/hava_siegelmann/13

http://compmusic.upf.edu/publications



[28] A Krishnaswamy. Application of pitch tracking to south indian classical music. InProc. of the IEEE Int. Conf. on Acous- tics, Speech and Signal Processing (ICASSP), pages557–560, 2003.

[29] V. Kumar, H. Pandya, and C.V. Jawahar. Identifying ragas in indian music. In 22ndInternational Conference on Pattern Recognition (ICPR), pages 767–772, 2014.

[30] Hwei-Jen Lin, Hung-Hsuan Wu, and Chun-Wei Wang. Music matching based onrough longest common subsequence. Journal of Information Science and Engineering,pages 95–110, 2011.

[31] Lie Lu, Muyuan Wang, and Hong-Jiang Zhang. Repeating pattern discovery andstructure analysis from acoustic music data. In Proceedings of the 6th ACM SIGMMInternational Workshop on Multimedia Information Retrieval, pages 275–282, 2004.

[32] David Meredith, Kjell Lemstrom, and Geraint A. Wiggins. Algorithms for discoveringrepeated patterns in multidimensional representations of polyphonic music. Journalof New Music Research, pages 321–345, 2002.

[33] Meinard Muller, Frank Kurth, and Michael Clausen. Audio matching via chroma-based statistical features. In Proceedings of International Society for Music InformationRetrieval (ISMIR), pages 288–295, 2005.

[34] Jiri Navratil and David Klusacek. On linear dets. In Proceedings of the IEEE InternationalConference on Acoustics, Speech, and Signal Processing, pages 229–232, 2007.

[35] Gaurav Pandey, Chaitanya Mishra, and Paul Ipe. Tansen: A system for automaticraga identification. In Indian International Conference on Artificial Intelligence, pages1350–1363, 2003.

[36] Pranav Patel, Eamonn Keogh, Jessica Lin, and Stefano Lonardi. Mining motifs inmassive time series databases. In Proceedings of IEEE International Conference on DataMining (ICDM02), pages 370–377, 2002.

[37] William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery.Numerical Recipes in C: The Art of Scientific Computing. Second Edition. Oxford UniversityPress, 1992.

[38] P. Rao, J. Ch. Ross, K. K. Ganguli, V. Pandit, V. Ishwar, A. Bellur, , and H. A. Murthy.Melodic motivic analysis of indian music. Journal of New Music Research, 43(1):115–131,2014.

[39] Joe Cheri Ross, Vinutha T. P., and Preeti Rao. Detecting melodic motifs from audiofor hindustani classical music. In Proceedings of 13th International Society for MusicInformation Retrieval (ISMIR), pages 193–198, 2012.

[40] Joe Cheri Ross and Preeti Rao. Detection of raga-characteristic phrases from hindus-tani classical music audio. Workshop on Computer Music, 2012. http://compmusic.upf.edu/publications.

[41] Justin Salamon and Emilia Gomez. Melody extraction from polyphonic music signalsusing pitch contours characteristics. In IEEE Transactions on Audio Speech and LanguageProcessing, 20(6):1759–1770, August 2012.

77



[42] Sridharan Sankaran, Krishnaraj P V, and Hema A Murthy. Automatic segmentationof composition in carnatic music using time-frequency cfcc templates. In Proceedingsof 11th International Symposium on Computer Music Multidisciplinary Research (CMMR),2015.

[43] J. Serra, E. Gomez, P. Herrera, and X. Serra. Chroma binary similarity and localalignment applied to cover song identification. Audio, Speech, and Language Processing,IEEE Transactions on, 16(6):1138–1151, Aug 2008.

[44] Joan Serra, Gopala K. Koduri, Marius Miron, and Xavier Serra. Assessing the tuningof sung indian classical music. In Proceedings of the 12th International Society for MusicInformation Retrieval Conference, ISMIR 2011, Miami, Florida, USA, October 24-28, 2011,pages 157–162, 2011.

[45] Sankalp Gulati Joan Serra and Xavier Serra. An evaluation of methodologies formelodic similarity in audio recordings of indian art music. In Proceedings of the 40thIEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2015,pages 678–682, April 2015.

[46] Surendra Shetty. Raga mining of indian music by extracting arohana-avarohanapattern. International Journal of Recent trends in Engineering, 1(1), 2009.

[47] Rajeswari Sridhar and Tv Geetha. Raga identification of carnatic music for musicinformation retrieval. International Journal of Recent trends in Engineering, 1(1):1–4,2009.

[48] M Subramanian. Carnatic ragam thodi pitch analysis of notes and gamakams. Journalof the Sangeet Natak Akademi, XLI(1):3–28, 2007.

[49] D Swathi. Analysis of carnatic music: A signal processing perspective. M.Tech. Thesis,IIT Madras, 2009.

[50] Alexandra L. Uitdenbogerd and Justin Zobel. Manipulation of music for melodymatching. MULTIMEDIA ’98 Proceedings of the sixth ACM international conference onMultimedia, pages 235–240, 1998.

78

analysis of motifs in carnatic music: a computational...

Documents