Analysis of Motifs in Carnatic Music: A Computational ... ?· In Carnatic music, a ... an attempt to…

Download Analysis of Motifs in Carnatic Music: A Computational ... ?· In Carnatic music, a ... an attempt to…

Post on 09-Sep-2018

212 views

Category:

Documents

0 download

TRANSCRIPT

  • Analysis of Motifs in Carnatic Music: AComputational Perspective

    A THESIS

    submitted by

    SHREY DUTTA

    for the award of the degree

    of

    MASTER OF SCIENCE(by Research)

    DEPARTMENT OF COMPUTER SCIENCE ANDENGINEERING

    INDIAN INSTITUTE OF TECHNOLOGY, MADRAS.October 2015

  • THESIS CERTIFICATE

    This is to certify that the thesis entitled Analysis of Motifs in Carnatic Music:

    A Computational Perspective, submitted by Shrey Dutta, to the Indian Institute

    of Technology, Madras, for the award of the degree of Master of Science (by

    Research), is a bona fide record of the research work carried out by him under my

    supervision. The contents of this thesis, in full or in parts, have not been submitted

    to any other Institute or University for the award of any degree or diploma.

    Dr. Hema A. MurthyResearch GuideProfessorDept. of Computer Science and EngineeringIIT-Madras, 600 036

    Place: Chennai

    Date:

  • ACKNOWLEDGEMENTS

    I joined IIT Madras with the intention of mastering the techniques used in machine

    learning. There is so much data available in digital form and I used to think that

    machine learning techniques help in making sense of this data just as human brain

    makes sense of the raw data received from different senses. As I started gaining

    deep understanding in machine learning techniques, I realized that these tech-

    niques are not mature enough to mimic the human brain and thus, should not be

    used blindly. I understood that the data needs to be represented in a sensible form

    which depends on the task under consideration. These techniques are designed to

    use this representation in achieving the desired task. After understanding this, I

    was able to use the existing techniques efficiently as well as design new techniques

    when required. This level of understanding was not possible without the immense

    knowledge and experience shared by my adviser, Prof. Hema A. Murthy, through

    endless captivating discussions.

    I would like to express my sincere gratitude to her for the excellent guidance,

    patience and providing me with an excellent atmosphere for doing research. She

    helped me to develop my background in signal processing and machine learning

    and to experience the practical issues beyond the textbooks. She has not only

    helped in improving my perspective towards research but also towards life.

    I would like to thank my collaborators Vignesh Ishwar, Krishnaraj Sekhar

    and Ashwin Bellur. The completion of this thesis would not have been possible

    without their contribution. They helped me in building datasets, carrying out the

    i

  • experiments, analyzing results and in writing research papers.

    I am grateful to the members of my General Test Committee, Prof. C. Chandra

    Sekhar and Prof. C. S. Ramalingam, for their suggestions and criticisms with

    respect to the presentation of my work. I am also grateful for being a part of the

    CompMusic project. It was a great learning experience working with the members

    of this consortium.

    I would like to thank my music teachers Prof. M.V.N. Murthy and Niveditha

    Bharath. Prof. M.V.N. Murthy patiently taught me to play the instrument,

    Saraswati Veena, in his unique and excellent style. He always encouraged me

    to explore the music beyond what he used to teach in classes which certainly

    manifested my creativity. Madam Nivedita Bharath taught me to sing Carnatic

    music. She is an excellent and a very friendly teacher. Her classes were full of fun

    and excitement. Learning music from these wonderful teachers also helped me to

    better understand the work with respect to this thesis.

    I would like to thank Aashish, Anusha, Asha, Jom, Karthik, Manish, Padma,

    Praveen, Raghav, Rajeev, Sarala, Saranya, Sridharan, Srikanth and other members

    of Donlab for their help and unconditional support over the years. It would have

    been a lonely lab without them. I am also grateful to Alastair, Ajay and Sankalp

    from MTG Barcelona for always clearing my doubts and helping in my research. I

    would also like to acknowledge the help of Kaustuv from IIT Bombay. He always

    found time to answer my questions regarding Hindustani music.

    I am also obliged to the European Research Council for funding the research un-

    der the European Unions Seventh Framework Program, as part of the CompMusic

    project (ERC grant agreement 267583).

    I would like to thank all my friends at IIT Madras without whom the life at IIT

    ii

  • campus would have been dry and boring. If not for them, I would have finished

    my thesis much earlier. They have always been a source of refreshment during

    stressful times.

    I would like to thank my parents who have made many sacrifices so that I can

    get a good education and a good life. They have always tolerated my stubborn

    and rebellious nature which I am constantly trying to change. I wish to make them

    proud one day.

    Lastly, I would like to thank my loving brother Anubhav for always being

    an anchor of my life. It was he who has taken the responsibility of financially

    supporting our family at an early age and motivated me to pursue any path I wish

    to choose. I will always be grateful to him and I wish him all the happiness in life.

    iii

  • ABSTRACT

    KEYWORDS: Carnatic Music, Pattern Discovery, Motif Spotting, Motif Dis-

    covery, Raga Verification, Stationary Points, Rough Longest

    Common Subsquence, Longest Common Segment Set

    In Carnatic music, a collective expression of melodies that consists of svaras

    (ornamented notes) in a well defined order and phrases (aesthetic threads of or-

    namented notes) that have been formed through the ages defines a raga. Melodic

    motifs are those unique phrases of a raga that collectively give a raga its identity.

    These motifs are rendered repeatedly in every rendition of the raga, either compo-

    sitional or improvisational, so that the identity of a raga is established. Different

    renditions of a motif makes it challenging for a time-series matching algorithm to

    match them as they differ slightly from each other. In this thesis, we design al-

    gorithmic techniques to automatically find these motifs, their different renditions

    and, then use the regions rich in these motifs to perform raga verification.

    The initial focus of the thesis is on finding different renditions of melodic

    motifs in an improvisational form of the raga called the alapana. Then we make

    an attempt to automatically discover these motifs from the composition lines. The

    results suggest that composition lines are indeed replete with melodic motifs.

    Using these composition lines, raga verification is performed. In raga verification,

    a melody (a single phrase or an aesthetic concatenation of many such phrases)

    along with a raga claim is supplied to the system. The system confirms or rejects

    the claim.

    iv

  • Two algorithms for time-series matching are proposed in this work. One is

    a modification of the existing algorithm, Rough Longest Common Subsequence

    (RLCS). Another proposed algorithm, Longest Common Segment Set (LCSS), is

    completely novel and uses in between matched segments to give a holistic score.

    Using the proposed algorithm LCSS, an error rate of 12% is obtained for raga

    verification on a database consisting of 17 ragas.

    v

  • TABLE OF CONTENTS

    ACKNOWLEDGEMENTS i

    ABSTRACT iv

    LIST OF TABLES x

    LIST OF FIGURES xi

    ABBREVIATIONS xii

    NOTATION xiii

    1 Introduction 1

    1.1 Overview of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . 1

    1.2 Contribution of the thesis . . . . . . . . . . . . . . . . . . . . . . . 3

    1.3 Organization of the thesis . . . . . . . . . . . . . . . . . . . . . . . 3

    2 Literature Survey 5

    3 Motif Spotting 20

    3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

    3.2 Stationary Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    3.2.1 Method of obtaining Stationary Points . . . . . . . . . . . . 23

    3.3 Rough Longest Common Subsequence Algorithm . . . . . . . . . 25

    3.3.1 Rough match . . . . . . . . . . . . . . . . . . . . . . . . . . 25

    3.3.2 WAR and WAQ for local similarity . . . . . . . . . . . . . . 26

    3.3.3 Score matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

    3.4 Modified-Rough Longest Common Subsequence . . . . . . . . . . 27

    3.4.1 Rough and actual length of RLCS . . . . . . . . . . . . . . 28

    vi

  • 3.4.2 RWAR and RWAQ . . . . . . . . . . . . . . . . . . . . . . . 28

    3.4.3 Matched rate on the query sequence . . . . . . . . . . . . . 30

    3.5 A Two-Pass Dynamic Programming Search . . . . . . . . . . . . . 30

    3.5.1 First Pass: Determining Candidate Motif Regions using RLCS 31

    3.5.2 Second Pass: Determining Motifs from the Groups . . . . 32

    3.6 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

    3.7 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . 33

    3.7.1 Querying motifs in the alapanas . . . . . . . . . . . . . . . . 33

    3.7.2 Comparison between RLCS and Modified-RLCS using longermotifs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

    3.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

    3.8.1 Importance of VAD in motif spotting . . . . . . . . . . . . 39

    3.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

    4 Motif Discovery 41

    4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

    4.2 Lines from the compositions . . . . . . . . . . . . . . . . . . . . . . 44

    4.3 Optimization criteria to find Rough Longest Common Subsequence 44

    4.3.1 Density of the match . . . . . . . . . . . . . . . . . . . . . . 45

    4.3.2 Normalized weighted length . . . . . . . . . . . . . . . . . 46

    4.3.3 Linear trend in stationary points . . . . . . . . . . . . . . . 46

    4.4 Discovering typical motifs of ragas . . . . . . . . . . . . . . . . . . 49

    4.4.1 Filtering to get typical motifs of a raga . . . . . . . . . . . . 49

    4.5 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

    4.6 Experiments and results . . . . . . . . . . . . . . . . . . . . . . . . 52

    4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

    5 Raga Verification 56

    5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

    5.2 Dataset used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

    5.2.1 Extraction of pallavi lines . . . . . . . . . . . . . . . . . . . . 58

    vii

  • 5.2.2 Selection of cohorts . . . . . . . . . . . . . . . . . . . . . . . 58

    5.3 Longest Common Segment Set Algorithm . . . . . . . . . . . . . . 59

    5.3.1 Common segments . . . . . . . . . . . . . . . . . . . . . . . 60

    5.3.2 Common segment set . . . . . . . . . . . . . . . . . . . . . 62

    5.3.3 Longest Common Segment Set . . . . . . . . . . . . . . . . 62

    5.4 Raga Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

    5.4.1 Score Normalization . . . . . . . . . . . . . . . . . . . . . . 65

    5.5 Performance evaluation . . . . . . . . . . . . . . . . . . . . . . . . 66

    5.5.1 Experimental configuration . . . . . . . . . . . . . . . . . . 66

    5.5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

    5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

    5.6.1 Combining hard-LCSS and soft-LCSS . . . . . . . . . . . . 69

    5.6.2 Reduction of overlap in score distribution by T-norm . . . 69

    5.6.3 Scalability of raga verification . . . . . . . . . . . . . . . . . 70

    5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

    6 Conclusion 71

    6.1 Salient Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

    6.2 Criticism of the work . . . . . . . . . . . . . . . . . . . . . . . . . . 72

    6.3 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

  • LIST OF TABLES

    2.1 Svaras and their respective ratios to the base pitch S. . . . . . . . 6

    3.1 Dataset of alapanas . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

    3.2 Short Motifs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

    3.3 Long Motifs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

    3.4 Short Motifs: Retrieved regions after the first pass . . . . . . . . . 34

    3.5 Long Motifs: Retrieved regions after the first pass . . . . . . . . . 35

    3.6 Short Motifs: Top 10 retrieved motifs after the second pass . . . . 35

    3.7 Long Motifs: Top 10 retrieved motifs after the second pass . . . . 35

    3.8 Long Motifs: Retrieved regions after the first pass . . . . . . . . . 37

    3.9 Long Motifs: Retrieved regions after the second pass . . . . . . . 38

    3.10 Retrieved Groups after both the passes for modified-RLCS withoutVAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

    4.1 D1: Dataset of composition lines . . . . . . . . . . . . . . . . . . . 50

    4.2 D1: Dataset for filtering . . . . . . . . . . . . . . . . . . . . . . . . 50

    4.3 D2: Dataset of composition lines . . . . . . . . . . . . . . . . . . . 51

    4.4 D2: Dataset for filtering . . . . . . . . . . . . . . . . . . . . . . . . 51

    4.5 D1:Similar motifs retrieved from composition lines . . . . . . . . . 52

    4.6 D1:Percentage of motifs preserved after filtering . . . . . . . . . . 53

    4.7 D2:Similar motifs retrieved from composition lines . . . . . . . . . 53

    4.8 D2:Percentage of motifs preserved after filtering . . . . . . . . . . 54

    5.1 Details of the database used. Durations are given in approximatehours (h), minutes (m) or seconds (s). . . . . . . . . . . . . . . . . 58

    5.2 EER(%) for different algorithms using different normalizations ondifferent datasets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

    ix

  • 5.3 Number of claims correctly verified by hard-LCSS only, by soft-LCSS only, by both and by neither of them for D1 and D2 usingT-norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

    x

  • LIST OF FIGURES

    2.1 Comparing Pitch Histogram of Raga Sankarabharanam with itsHindustani and Western classical counterparts. . . . . . . . . . . . 7

    2.2 Comparing a phrase in raga Sankarabaranam with gamakas and with-out gamakas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    2.3 The gamakas in their true form are marked in a pitch contour of amelody . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    2.4 Tonic normalization of two similar phrases in raga sankarabha-ranam rendered at different tonics. . . . . . . . . . . . . . . . . . . 10

    2.5 Different renditions of a melodic motif in raga Kalyani and ragaKamboji. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    2.6 Different instances of a melodic motif in an alapana marked in red. 12

    2.7 Extraction of stationary points and their interpolation to get a smoothpitch contour. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    3.1 A Phrase with Stationary Points . . . . . . . . . . . . . . . . . . . . 22

    3.2 The Pitch and Stationary Point Histograms of the raga Kamboji . 23

    3.3 Original and Cubic Interpolated pitch contours . . . . . . . . . . . 24

    3.4 a) True positive groups and false alarm groups score distributionfor RLCS. b) True positive groups and false alarm groups scoredistribution for modified-RLCS. . . . . . . . . . . . . . . . . . . . 37

    4.1 RLCS matching two sequences partially . . . . . . . . . . . . . . . 42

    4.2 Slopes of the linear trend of stationary points helps in reducing thefalse alarms. The last three phrases are false alarms. . . . . . . . . 47

    5.1 An example of a common segment set between two sequences rep-resenting the real data . . . . . . . . . . . . . . . . . . . . . . . . . 60

    5.2 DET curves comparing LCSS algorithm with different algorithmsusing different score normalizations . . . . . . . . . . . . . . . . . 68

    5.3 Showing the effect of T-norm on the score distribution . . . . . . . 70

    xi

  • ABBREVIATIONS

    DTW Dynamic Time Warping

    UE-DTW Unconstrained Endpoint - Dynamic Time Warping

    LCS Longest Common Subsequence

    RLCS Rough Longest Common Subsequence

    RCS Rough Common Subsequence

    WAR Width Across Reference

    WAQ Width Across Query

    RWAR Rough Width Across Reference

    RWAQ Rough Width Across Query

    HMM Hidden Markov Model

    LSF Least Squares Fit

    LCSS Longest Common Segment Set

    Z-Norm Zero Normalization

    T-Norm Test Normalization

    EER Equal Error Rate

    VAD Voice Activity Detection

    xii

  • NOTATIONS

    f Frequency value in hertzdri,q j Distance between references i-th value and querys j-th valueTd Threshold on distance, dri,q jci, j Cost of RLCS till references i-th value and querys j-th valuewri, j WAR till references i-th value and querys j-th valuewqi, j WAQ till references i-th value and querys j-th value A weight on density Matching ratecai, j Actual length of RLCS till references i-th value and querys j-th valuewri, j RWAR till references i-th value and querys j-th valuewqi, j RWAQ till references i-th value and querys j-th valuest A semitone in centsSXY Density of RCS SXYlwSXY Actual length of RCS SXYgX Gaps in sequence XgY Gaps in sequence Ysim Threshold on the similarity scoresXSXY Slope of the linear trend of stationary points in sequence XXSXY Standard Deviation of the linear trends slope in sequence XSXY Similarity in the linear trend of stationary points in sequences X and Y The number of gaps between two hard segments Penalty issued for each gapclaimI Imposter mean for the claimclaimI Imposter standard deviation for the claim

    xiii

  • CHAPTER 1

    Introduction

    1.1 Overview of the thesis

    In Carnatic music, a raga is a collective expression of melodies which consists of:

    1. A set of svaras (ornamented notes) ordered in a well defined manner.

    2. Phrases (aesthetic threads of ornamented notes) as established by perfor-mances through ages as rendered in well known compositions.

    While there are some ragas, in particular, for which the first condition suffices,

    in general both these conditions are necessary and are used in practice. The

    phrases that collectively give a raga its identity are called melodic motifs. The

    melodic motifs are unique to a raga. Therefore, in any rendition of the raga, either

    compositional or improvisational, these motifs are rendered in order to establish

    the ragas identity. Different renditions of a motif may differ slightly from each

    other, but they are enough to confuse a time-series matching algorithm. The goal

    of the thesis is to design algorithmic techniques to automatically find these motifs,

    their different renditions and, then use the regions replete with these motifs to

    perform raga verification.

    The initial part of the thesis is dedicated towards finding different renditions of

    melodic motifs in an improvisational form of raga called the alapana. This problem

    is known as motif spotting. A melodic motif, preselected by a musician, is used

    as a query and its different renditions are spotted using a matching algorithm.

  • Following this work, inspired by how trained listeners identify ragas, automatic

    discovery of motifs is attempted using certain segments of compositions which are

    supposed to be rich in motifs. Similar phrases are extracted from a number of such

    segments of the compositions in a particular raga. All similar phrases need not be

    melodic motifs. Some of them could also appear in other ragas thus, violating the

    uniqueness property of the motifs. Therefore, these non-motif phrases are filtered

    out if they are found in composition lines of other ragas. Using this approach,

    various motifs are discovered for 14 ragas thus, confirming that these segments

    are replete with motifs. Therefore, using these segments of compositions, raga

    verification is performed. In raga verification, a melody (a single phrase or an

    aesthetic concatenation of many such phrases) along with a raga claim is supplied

    to the system. The system confirms or rejects the claim. Raga verification is

    performed by comparing the snippet of audio supplied with various composition

    lines of the claimed raga. The obtained score is matched against the scores obtained

    with composition lines of confusing ragas using score-normalization techniques.

    Two algorithms for time-series matching are proposed in this work. One is

    a modification of the existing algorithm, Rough Longest Common Subsequence

    (RLCS). Another proposed algorithm, Longest Common Segment Set (LCSS), is

    completely novel and uses in between matched segments to give a holistic score.

    This algorithm comes in two forms: hard and soft. Hard-LCSS treats individ-

    ual matched segments separately irrespective of their lengths and distribution

    whereas, soft-LCSS can join two or more segments based on their lengths and

    distribution in order to compute a holistic score. Using the proposed algorithms,

    an error rate of 12% is obtained for raga verification on a database consisting of

    17 ragas.

    2

  • 1.2 Contribution of the thesis

    The following are the main contributions of the thesis.

    1. A measure based on the stationary points of the pitch contour is introducedthat reduces the number of false alarms.

    2. Modifications to an existing time-series matching algorithm, known as RoughLongest Common Subsequence, are proposed that reduces the number of falsealarms and results in better localization.

    3. A new time-series matching algorithm, known as Longest Common SegmentSet, is proposed which performs better for the task of raga verification.

    4. Approaches are proposed to discover melodic motifs automatically from thecomposition lines and to find their different renditions.

    5. A system is designed to perform raga verification which is scalable to anynumber of ragas.

    1.3 Organization of the thesis

    The organization of the thesis is as follows: In Chapter 2, a brief background on

    carnatic music is given which is required for a better understanding of the work.

    Some of the related works on spotting motifs, motif discovery and raga verfication

    is also discussed this chapter.

    Chapter 3 is dedicated towards describing the approach proposed in this the-

    sis to find different renditions of motifs. This chapter describes the quantization

    of a pitch contour into stationary point that preserved most of the raga informa-

    tion. This chapter also describes the modifications made to an existing time-series

    matching algorithm.

    Chapter 4 describes the proposed approach for automatically discovering the

    melodic motifs from the composition lines of the ragas. A measure is defined based

    on the stationary points which reduces the false alarms.

    3

  • Chapter 5 is dedicated towards explaining the raga verification system. Auto-

    matic extraction of composition lines from a given composition is discussed. This

    chapter also describes the concept of cohorts for a raga. A new time-series match-

    ing algorithm, named as Longest Common Segment Set, is also proposed in this

    chapter.

    Finally, Chapter 6 summarizes the work and discusses the possible future work.

    4

  • CHAPTER 2

    Literature Survey

    Carnatic music is an art music (often also referred to as classical music) tradition

    commonly associated with four states of Southern India: Andra Pradesh, Karnataka,

    Kerela and Tamil Nadu, and also some parts of Maharashtra. It is one of the two

    main sub-genres of Indian classical music. The other sub-genre is Hindustani Music

    which is mainly practiced in North India and also some parts of South India.

    A Carnatic music concert is an ensemble of the main performer (usually a

    vocalist), an accompanist (usually a violinist, occasionally a vainika or flautist) and

    percussionists (a single mridangam vidwan (main percussionist), or an ensemble

    of percussionists). If the main percussionist is right handed, s/he sits to the right

    of the main artist and the violinist sits to left. The positions are exchanged when

    the mridangam vidwan is left handed. All the performers sit on the stage cross

    legged without any support.

    The first musical sound of a concert is always of a tambura, a drone instrument

    which provides the tonic for the entire concert. The tambura (tanpura) is a string

    instrument that has four strings tuned to three pitches: P-(S)-(S)-S. S is the first

    pitch of an octave whereas P is 1.5 times the pitch of S which makes P the

    seventh pitch of the octave. The two (S)s, being twice the pitch of S, represent the

    first pitch of the second octave. When these four strings are played continuously

    in a conventional manner, the perceived sound, rich in harmonics, provides the

    harmonic base for the performance.

  • Table 2.1: Svaras and their respective ratios to the base pitch S.

    Svaras S R1 R2/G1 R3/G2 G3 M1 M2 P D1 D2/N1 D3/N2 N3 (S)Ratio 1 16/15 9/8 6/5 5/4 4/3 17/12 3/2 8/5 5/3 9/5 15/8 2

    with S 1.000 1.067 1.125 1.200 1.250 1.333 1.417 1.500 1.600 1.667 1.800 1.875 2.000

    The sound of S, also referred as sruti, is a base pitch (tonic) with respect

    to which all other pitches are defined. These musical pitches in the context of

    Carnatic music are referred to as svaras. S (Sadja) and P (Panchama) are the

    two of the seven svaras in an octave; other 5 being R (Rishabha), G (Gandhara),

    M (Madhayama), D (Dhaivata) and N (Nisada). These five svaras have defined

    variability. They take two to three pitch positions in contrast to S and P as

    shown in Table 2.1. These manifestations of a svara, into multiple pitch positions,

    are collectively defined as svarasthanas (svara positions) [27]. There are 12 pitch

    positions within an octave and the total number of svarasthanas are 16. Therefore,

    as shown in Table 2.1, there are overlaps between svarasthanas sharing the same

    pitch position. For example, Chatusruti Rishabha (R2) and Suddha Gandhara (G1)

    share the same pitch position. Therefore, this pitch position can be interpreted as

    one of these two svarasthanas depending on the context.

    A svara is not perceived as a single point of frequency although it is referred

    as a definitive pitch. It is perceived as movements within a small range of pitch

    values around a dominant mean. Figure 2.1 shows the histogram of pitch values

    in a melody of raga Sankarabharanam and compares it with its Hindustani (raga

    Bilawal) and Western classical (C-Major) counterparts that share the same scale.

    The pitch histogram is continuous for Carnatic music and Hindustani music but

    more or less discrete for Western music. It is clearly seen that the svaras are a range

    of pitch values and this range is maximum for Carnatic music. This is because, the

    intonation of a svara within the permissible range cognitively refers to only one

    svarasthana. For example, when the svarasthana Antara Gandhra (G3) is constantly

    6

  • 0 200 400 600 800 1000 12000

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    Frequency (Cents)

    No

    rma

    lize

    d D

    en

    sity

    Western Classical C Major Scale

    Hindustani Classical Raga Bilawal

    Carnatic Classical Raga Shankarabharanam

    S

    R2

    G3

    M1

    D2

    N3

    P

    Figure 2.1: Comparing Pitch Histogram of Raga Sankarabharanam with its Hin-dustani and Western classical counterparts.

    moving within a range, it is cognitively recognized as G3 even if it touches upon

    other svarasthanas. This concept where a svara is used to create a variability of

    movement in relation to the phraseology and melodic identity, creating a cognitive

    understanding of the svarasthana, is defined as a gamaka [27]. Therefore, a svara is

    a complete embodiment of svarasthana and the associated gamakas.

    There have been various documentations about the types and number of

    gamakas. In [13] these gamakas are classified into 13 types. A comparison of a

    phrase rendered in raga Sankarabharanam with gamakas and without gamakas is

    shown in Figure 2.2. The phrases are represented as time-frequency trajectories of

    pitch values. This trajectory is also referred as pitch contour. From the Figure

    2.2 it is obvious that the deviations of the pitches from the norm is much higher

    for gamaka laden svaras. The notes of Western classical music are transformed to a

    7

  • 0 1 2 3 4 5 6 7 8140

    160

    180

    200

    220

    240

    260

    Time (seconds)

    Fre

    qu

    en

    cy (

    He

    rtz)

    Tonic E

    0 1 2 3 4 5 6140

    160

    180

    200

    220

    240

    260

    Time (seconds)

    Fre

    qu

    en

    cy (

    He

    rtz)

    Tonic E

    a) With gamakas b) Without gamakas

    Figure 2.2: Comparing a phrase in raga Sankarabaranam with gamakas and withoutgamakas

    symbolic notation due to their shorter pitch range. Sometimes, even the improvi-

    sations are also written in symbolic form but the gamaka laden svaras of Carnatic

    music are difficult to express in a symbolic form.

    It is precisely the presence of extensive gamakas that renders developing a sym-

    bolic representation of Carnatic music extremely difficult. It also poses significant

    challenges in the analysis of Carnatic music. These gamakas, however difficult to

    represent, form the essential content of a melodic phrase. Since, a svara is mostly

    rendered using gamakas, it was earlier thought that a melodic phrase could be

    quantized in terms of 13 gamakas described in [13]. In practice, even though the

    svaras are sung using these gamakas they are mostly present in a modified form

    than a true form which is described in [13]. Figure 2.3 shows the pitch contour

    of a melodic segment. It is clear that the gamakas present in their true form are

    very rare thus making it difficult to quantize a melody in terms of these gamakas.

    If a melody cannot be quantized in a sequence of gamakas, how can a melody be

    represented? Before addressing this question it is important to understand the

    concept of a raga and its various forms of renditions.

    8

  • Figure 2.3: The gamakas in their true form are marked in a pitch contour of a melody

    The concept of a raga is very central to Carnatic music. A raga is a collective

    expression of melodies that consists of gamaka laden svaras and phrases (smaller

    melodic units) as rendered in well known compositions through the ages [26].

    Scale or svara sequence of a raga is defined by its arohana and avarohana. Aro-

    hana corresponds to the ascending order of svaras in the raga whereas avarohana

    corresponds to the descending order of svaras in terms of pitch. Tonic is crucial in

    the identification of a raga. A melody when heard without a referred tonic can be

    perceived as two different ragas depending on the svara that is considered as tonic

    [27].

    Figure 2.4 shows the pitch contours of two similar phrases in raga Sankarab-

    haranam. These phrases are rendered at different tonics. Any time series match-

    ing algorithm will give large error during matching even though these are same

    phrases in the same raga but in different tonics. Therefore, before performing

    any kind of matching, the normalization of these phrases with respect to tonic is

    important. Tonic normalization of these phrases is shown in Figure 2.4.

    Raga identification can be done at different levels [27]. In some cases, a svara

    itself, even when rendered without a gamaka, may be sufficient to identify a raga.

    Identification of a raga can also be aided by different expression of a gamaka on a

    svara. Phrases of a raga may also be used to identify it.

    9

  • 0 1 2 3 4 5 6 7 8 9140

    160

    180

    200

    220

    240

    260

    280

    300

    Time (seconds)

    Fre

    quency (

    Hert

    z)

    Tonic E

    Tonic F#

    (1200. log2 ftonic

    )I

    0 1 2 3 4 5 6 7 8 9400

    200

    0

    200

    400

    600

    800

    1000

    Time (seconds)

    Fre

    quency (

    Cents

    )

    Tonic E

    Tonic F#

    Figure 2.4: Tonic normalization of two similar phrases in raga sankarabharanamrendered at different tonics.

    A phrase is an aesthetic thread of the articulated and the unarticulated svaras

    in a raga. Phrases that collectively give a raga its identity are called melodic motifs.

    Each time a musician renders a phrase, its form varies even though the core identity

    is recognized. Figure 2.5 shows different renditions of a melodic motif in raga

    Kalyani and raga Kamboji. The renditions differ slightly from each other but these

    are enough to confuse a time-series matching algorithm. These improvisations

    should not make the phrase sound like a different raga. Sometimes, even with a

    little improvisation a phrase is perceived in a different raga. An example in [27]

    states, the phrase P D1 N2 D1 P M1 with an elongated N2 is common to raga Thodi

    and Bhairavi. A gamaka on M1 makes it sound like Bhairavi and without the gamaka

    it sounds like raga Thodi. Although phrases can be sung at different speeds, for

    some phrases an increase in speed constricts the rendition of gamakas in a svara

    which can result in a different raga [27]. Other than variations within a phrase, the

    way each phrase connects to another also changes but the raga remains the same.

    A raga is rendered in various compositional and improvisational forms. Almost

    all compositional forms start with a section called pallavi. The pallavi is usually

    made up of one or two lines but is rich with melodic motifs of the raga [26]. The

    anupallavi is the second section of the composition. In anupallavi, the melodic

    movements in the higher octaves, with reference to the tonic, are present [26]. If

    10

  • Figure 2.5: Different renditions of a melodic motif in raga Kalyani and raga Kam-boji.

    the anupallavi is present in a composition, it is always rendered after the pallavi

    before any other section. Charanam is another section found in most compositions

    which has a variable length depending on the type of composition [26].

    There are many improvisational forms in Carnatic music like alapana, tanam, ni-

    raval, kalpana svara, etc. We will discuss only alapana in detail as it is relevant to the

    work. Alapana is generated by the musicians distinctive imagination and creativ-

    ity. Alapana is the opening of a raga and brings all the aspects of the raga without

    using other elements like tala. Every alapana begins with a phrase (melodic motif)

    that clearly establishes the identity of the raga. Once the identity is established, mu-

    sicians tend to further explore the raga. This exploration leads to small variations

    that start appearing in the renditions of svaras and phrases in the form of gamakas

    or slight deviations from known phrases. These variations lead to newer phrases

    in the raga that, over a period of time, can be used to identify the raga and can be

    called as melodic motifs of that raga [26]. The possible ways to move from one

    known phrase to another are numerous. The musician exploits this gap between

    two known phrases and aesthetically connect them with a new phrase[26].

    11

  • 0 1 2 3 4500

    0

    500

    1000

    1500

    2000

    Time in Minutes

    Fre

    qu

    en

    cy in

    Ce

    nts

    Pitch Contour of an Alapana

    0 0.5 1600

    800

    1000

    1200

    1400

    Time in Seconds

    Fre

    qu

    en

    cy in

    Ce

    nts

    Motif1

    0 0.5 1600

    800

    1000

    1200

    1400

    Time in Seconds

    Fre

    qu

    en

    cy in

    Ce

    nts

    Motif2

    0 0.5 1600

    700

    800

    900

    1000

    1100

    1200

    1300

    Time in Seconds

    Fre

    qu

    en

    cy in

    Ce

    nts

    Motif3

    Figure 2.6: Different instances of a melodic motif in an alapana marked in red.

    Therefore, the raga while having an aesthetic core is also an evolving entity

    through endless improvisation. In spite of this evolution, the identity of the raga

    remains intact in most cases. A raga is much like an evolving personality while

    the person remains the same. An example of an alapana showing the instances of

    known phrases (melodic motifs) is shown in Figure 2.6. Earlier discussed melody

    in Figure 2.3 is also of an alapana which made it clear that a melody in a raga

    cannot be quantized in terms of 13 gamakas described in [13]. Now we will address

    the question asked earlier, If a melody cannot be quantized in a sequence of

    gamakas, how can a melody be represented? We know that every raga consists

    of the well known phrases (melodic motifs) that are unique to that raga and can

    be used to identify it. These phrases are also referred to as characteristic motifs,

    distinctive motifs and typical motifs. In any rendition of a raga, it is required that

    the characteristic motifs are rendered in order to establish the identity of the raga.

    If the motifs in a recording can be located, then these motifs can be used to index

    the recording. The focus of the initial part of the thesis is on locating motifs (as

    defined by a musician) in a continuous alapana.

    12

  • In [20], the uniqueness of these characteristic motifs was established using a

    closed set motif recognition experiment using Hidden Markov Model (HMM).

    Following this work, we attempt to spot motifs given a long alapana interspersed

    with motifs. From Figure 2.5, it is clear that motifs that seemingly identical from

    a perception perspective appear quite different (visually) when viewed as time

    series. Time series motif recognition has been attempted for Hindustani music.

    In [39], the onset point of the rhythmic cycle, emphasized by the beat of the tabla

    (an Indian percussion instrument), is used as a cue for potential motif regions. In

    another work [40], motif spotting is attempted in a Bandish (a type of composition

    in Hindustani music) using elongated notes (nyaas svara).

    Spotting motifs in a raga alapana is equivalent to finding a subsequence in a

    time-frequency trajectory of the alapana. Interestingly, the duration of these motifs

    may vary, but the relative duration of the svaras is preserved across the motif.

    The attempt in this thesis is to use pitch contours as a time series and employ

    time series pattern capturing techniques to identify the motif. The techniques

    are customized to use the properties of Carnatic music. There has been work

    done on time series motif recognition in fields other than music. In [36], a time

    series motif is defined and motif discovery is attempted using the Enumeration of

    Motifs through Matrix Approximation (EMMA) algorithm. In [4] and [28], time

    series motifs are discovered by adapting the random projection algorithm to time

    series data. In [3], a new warping distance called Spatial Assembling distance is

    defined and used for pattern matching in streaming data. In [30], music matching

    is attempted using a variant of the Longest Common Subsequence (LCS) algorithm

    called Rough Longest Common Subsequence (RLCS).

    Chapter 3 attempts similar time series motif matching for Carnatic Music.

    13

  • Figure 2.7: Extraction of stationary points and their interpolation to get a smoothpitch contour.

    Searching for a 2-3 second motif (in terms of a pitch contour) in a 10 min alapana

    (also represented as a pitch contour) can be erroneous, owing to pitch estimation

    errors. To address this issue, the pitch contour of the alapana is first quantized

    to a sequence of stationary points (points in the pitch contour where the first-

    derivative is 0), as shown in Figure 2.7, which are meaningful in the context of a

    raga. The meaningfulness of these stationary points is validated by 13 listeners. In

    order to validate, stationary points were interpolated using cubic B-splines. The

    pitch trajectory corresponding to that of the interpolated curve was then used to

    generate the melody. A similarity test was then performed to determine when

    the original melodic segments and melodic segments generated after interpolation

    were indeed similar. A very high similarity score of 7, out of 10, was obtained.

    The examples presented to the listeners for validation are available online1.

    To determine the location of the motif, a two-pass search is performed. In

    the first pass, Rough Longest Common Subsequence approach with modifications

    is used to find the region corresponding to the location of the motif using the

    1http://www.iitm.ac.in/donlab/motif_analysis.html

    14

    http://www.iitm.ac.in/donlab/motif_analysis.html

  • stationary points. Once the region is located, another pass is made on this region

    using the raw pitch contour instead of stationary points. Although the results

    using this approach were very promising, it required that musicians first identify

    typical motifs manually. It was also observed that the number of false alarms were

    significantly high. Also, the correlation amongst musicians with respect to correct

    phrases was as high as 0.8 while, for false alarms the correlation was as low as

    0.4. High ranking false alarms were primarily due to partial matches with the

    given query. Many of these were considered as an instance of the queried motif

    by some musicians. Initially, motifs with shorter duration were used and for these

    shorter motifs the inconsistency was high. Due to these problems, the scalability

    of this approach to extend to more ragas was less. This also illustrates that the

    typical motif itself is questionable. Nevertheless, there is a core using which the

    audience very quickly identifies ragas easily. The rest of the thesis focuses on this

    ability of listeners.

    As alapana is an improvisational segment, the rendition of the same motif could

    be different across alapanas especially among different schools. On the other hand,

    compositions in Carnatic music are rendered more or less in a similar manner. Al-

    though the music evolved through the oral tradition and fairly significant changes

    have crept into the music, renditions of compositions do not vary very significantly

    across different performers and schools. The number of variants for each line of the

    song can vary quite a lot though. Nevertheless, the typical motifs and the metre

    of motifs will be generally preserved. An attempt is therefore made, to determine

    the typical motifs automatically.

    It is discussed in [32] that not all repeating patterns are interesting and relevant.

    In fact, the vast majority of exact repetitions within a music piece are not musically

    15

  • interesting. The algorithm proposed in [32] mostly generates interesting repeating

    patterns along with some non-interesting ones which are later filtered during post

    processing. This work is an attempt from a similar perspective. The only difference

    is that typical motifs of ragas need not be interesting to a listener. The primary

    objective for discovering typical motifs, is that these motifs can be used to index

    the audio of a rendition. For example, as discussed earlier, while performing an

    alapana of a raga, musicians bridge two well known motifs of that raga with new

    phrases using their creativity. These new phrases are musically more interesting

    as they are the result of an ever evolving raga. The known typical motifs can be

    used to index the alapana and the new phrases connecting them could be extracted.

    Typical motifs could also be used for raga classification.

    In Carnatic music, the composition still holds a very important position. Many

    artists change the phrases in the alapana based on the composition that is likely

    to follow. The proposed approach in this work generates similar patterns across

    composition lines of a raga. From these similar patterns, the typical motifs are

    filtered by using composition lines of other ragas. Motifs are considered typical of

    a raga if they are present in the composition lines of a given raga and absent from

    composition lines of other ragas. This filtering approach is similar to anti-corpus

    approach of Conklin [8, 9] for the discovery of distinctive patterns.

    Most of the earlier work, regarding discovery of repeated patterns of interest in

    music, is on western music. In [22], B. Jansen et al discuss the current approaches

    on repeated pattern discovery. It discusses string based methods and geometric

    methods for pattern discovery. In [31], Lie Lu et al used constant Q transforms and

    proposed a similarity measure between musical features for doing repeated pattern

    discovery. In [32], Meredith et. al. presented Structure Induction Algorithms (SIA)

    16

  • using a geomatric approach for discovering repeated patterns that are musically

    interesting to the listener. In [6, 7], Collins et. al. introduced improvements in

    Merediths Structure Induction Algorithms. There has also been some significant

    work on detecting melodic motifs in Hindustani music by Joe Cheri Ross et. al.

    [39]. In this approach, the melody is converted to a sequence of symbols and a

    variant of dynamic programming is used to discover the motif.

    As mentioned before the typical motifs can be used for raga classification but

    when the number of ragas increases, the scalability of this approach becomes an

    issue. In Chapter 5, inspired by how the listener tries to identify raga during a

    concert, an attempt is made to mimic the same. During a concert, the performer

    usually begins by establishing the identity of the raga. When the musician is

    establishing the identity by rendering the raga, the listener narrows down the

    search space from hundreds of ragas to a small likely subset of ragas. By further

    listening to the musician, the listener identifies the peculiarities, match them with

    the shortlisted ragas and finally, identifies the raga. First, to mimic the reduction of

    search space, a raga recording is presented with a claim. The claim is a raga which

    a listener has associated it with. For every raga, a set of cohorts are identified

    by a musician. Cohorts are ragas that have similar phrases and can be confused

    with the given raga. The cohort raga list is used to reduce the search space. The

    task that remains is to determine whether claimed raga is correct. This is done by

    using a novel matching algorithm known as Longest Common Segment Set (LCSS)

    along-with score normalization.

    There is no parallel in Western classical music to raga verification. The closest

    that one can associate with, is cover song detection [14, 33, 43], where the objective is

    to determine the same song rendered by different musicians. Whereas, as discussed

    17

  • in Chapter 2, two different renditions of the same motif may not be identical.

    Several attempts have been made earlier to identify ragas [5, 11, 12, 18, 20, 25, 29, 47].

    Most of these efforts have used small repertoires or have focused on ragas for which

    ordering is not important. In [47], the audio is transcribed to a sequence of notes

    and string matching techniques are used to perform raga identification. In [5],

    pitch-class and pitch-dyads distributions are used for identifying ragas. Bigrams

    on pitch are obtained using a twelve semitone scale. In [35], the authors assume that

    an automatic note transcription system for the audio is available. The transcribed

    notes are then subjected to HMM based raga analysis. In [25, 46], a template based

    on the arohana and avarohana is used to determine the identity of the raga. The

    frequency of the svaras in Carnatic music is seldom fixed. Further, as indicated

    in [48] and [49], the improvisations in extempore enunciation of ragas can vary

    across musicians and schools. This behaviour is accounted for in [23, 24, 29] by

    decreasing the binwidth for computing melodic histograms. In [29], steady note

    transcription along with n-gram models is used to perform raga identification. In

    [11] chroma features are used in an HMM framework to perform scale independent

    raga identification, while in [12] hierarchical random forest classifier is used to

    match svara histograms. The svaras are obtained using the Western transcription

    system. These experiments are performed on 4 to 8 different ragas of Hindustani

    music. In [18], an attempt is made to perform raga identification using semi-

    continuous Gaussian mixtures models. This will work only for ragas with linear

    ordering of svaras.

    Recent research indicates that a raga is characterised best by a time-frequency

    trajectory rather than a sequence of quantised pitches [20, 38, 39, 45]. In [38, 39],

    the sama of the tala (emphasised by the bol of tabla) is used to segment a piece. The

    repeating pattern in a bandish in Hindustani Khayal music is located using the

    18

  • sama information. In [20, 38], motif identification is performed for Carnatic music.

    Motifs for a set of five ragas are defined and marked carefully by a musician.

    Motif identification is performed using hidden Markov model (HMM) trained

    for each motif. Similar to [39], motif spotting in an alapana in Carnatic music is

    performed in Chapter 3. In [45], a number of different similarity measures for

    matching melodic motifs of Indian music was attempted. It was shown that the

    intra pattern type variance of the melodic motifs is higher for Carnatic music in

    comparison with that of Hindustani music. It was also shown that the similarity

    obtained is very sensitive to the measure used. All these efforts are ultimately

    aimed at obtaining typical signatures of ragas. It is shown in Chapter 3 that there

    can be many signatures for a given raga. To alleviate this problem in Chapter 4,

    an attempt was made to obtain as many signatures for a raga by comparing lines

    of compositions. Here again, it was observed that the typical motif detection was

    very sensitive to the distance measure chosen. Using typical motifs/signatures for

    raga identification is not scalable, when the number of ragas under consideration

    increases. In raga-verification, as the task of identifying ragas narrows to a few

    number of ragas, it is scalable to any number of new ragas.

    19

  • CHAPTER 3

    Motif Spotting

    3.1 Introduction

    A raga in Carnatic music can be characterised by a set of distinctive motifs. Dis-

    tinctive motifs can be characterised by the trajectory of inflected svaras over time.

    These motifs are of utmost aesthetic importance to the raga. Carnatic music is

    a genre abundant with compositions. These compositions are replete with many

    distinctive motifs. These motifs are used as building blocks for extempore improvi-

    sational pieces in Carnatic music. These motifs can also be used for distinguishing

    between two ragas, and also for archival and learning purposes. The objective of

    the work presented in this chapter is to spot the location of the distinctive motifs in

    an extempore enunciation of a raga called the alapana. In Carnatic music, the motifs

    are laden with gamakas [27]. In addition, the motifs are similar across musicians

    but not necessarily identical. The duration of the motifs can also vary quite signif-

    icantly although the rhythm may be preserved. The query motif in general is very

    short in duration compared to that of the test music segment. Several factors need

    to be considered when dealing with this problem namely: selection of features,

    time complexity, tolerance to noise, tolerance to speed variation, allowing partial

    matches or rough matches rather than exact matches, timbre, etc [17, 30, 50].

    In this chapter, pitch is used as the main feature for the task of motif spot-

    ting. Substantial research exists on analysing different aspects of Carnatic music

    computationally, using pitch as a feature. In [28], gamakas are characterized and

  • analysed using pitch contours. In [21], tuning of Indian classical music is studied

    using pitch histograms. In [48], the motifs are extensively studied in the raga Thodi

    using pitch histograms and pitch contours. All of the above prove the relevance

    and importance of pitch as a feature for computational analysis of Carnatic music.

    There are a number of dynamic programming techniques namely,the Dynamic

    Time Warping (DTW), the Longest Common Subsequence (LCS) and their variants,

    which are used for similar music matching tasks. DTW takes care of the speed

    variations due to warping but forces the match from end-to-end of both the query

    and the test sequences. Even unconstrained endpoint DTW will align an entire

    query with a part of the test sequence [16]. In motif-spotting, there can be instances

    where one can expect that most of the query is roughly matched with a part of

    the test sequence. Although LCS does not force the match between query and test

    to be end-to-end, it does not give importance to local similarity. Rough Longest

    Common Subsequence (RLCS) addresses the issue of local similarity where some

    leeway is given for partial query matches [30]. Other than partial query matches,

    when the characteristic motif, for example, Sa Ni Da Pa Da is rendered as Sa

    Ri Ni Da Pa Da, RLCS gives a good match since it gives the longest matched

    subsequence.

    3.2 Stationary Points

    The task therefore is to attempt automatic spotting of a motif that is queried. The

    motif is queried against a set of alapanas of a particular raga to obtain locations of

    the occurrences of the motif. The task is non-trivial since no particular rhythm

    is maintained in an alapana nor it is accompanied by a percussion instrument.

    21

  • Figure 3.1: A Phrase with Stationary Points

    Figure 2.6 shows repetitive occurrences of motifs in a piece of music. An enlarged

    view of the motif is also shown. Since the alapana is much longer than the motif,

    searching for a motif in an alapana is like searching for a needle in a haystack. After

    an analysis of the pitch contours and discussions with professional musicians, it

    was conjectured that the pitch contour can be quantized at stationary points. The

    conjecture was confirmed as explained in Chapter 2. Figure 3.1 shows an example

    phrase of the raga Kamboji with the stationary points highlighted.

    Musically, the stationary points are a measure of the extent to which a particular

    svara is intoned. In Carnatic music since svaras are rendered with gamakas, there is

    a difference between the notation and the actual rendition of the phrase. However,

    there is a one to one correspondence with the stationary point frequencies and

    what is actually rendered by the musician (Figure 3.1). Figure 3.2 shows the pitch

    histogram and the stationary point histogram of an alapana of the raga Kamboji.

    The similarity between the two pitch histograms vindicates our conjecture that

    stationary points are important.

    22

  • Figure 3.2: The Pitch and Stationary Point Histograms of the raga Kamboji

    3.2.1 Method of obtaining Stationary Points

    Carnatic music is a heterophonic musical form. In a Carnatic music vocal concert,

    a minimum of two accompanying instruments play simultaneously along with

    the lead artist. These are the violin and the mridangam (a percussion instrument

    in Carnatic music). Carnatic music is performed at a fixed tonic[2] to which all

    instruments are tuned. The tonic is chosen by the lead artist and is maintained

    throughout the performance by an instrument called the Tambura as discussed in

    Chapter 2. The simultaneous performance of many instruments in addition to the

    voice renders pitch extraction of the predominant voice a tough task. This leads

    to octave errors and other erroneous pitch values. For this task it is necessary that

    pitch be continuous. After experimenting with various pitch algorithms, it was

    observed that the Melodia-Pitch Extraction algorithm [41] produced the fewest

    errors. This was verified after re-synthesis using the pitch contours. In case of

    an octave error or any other such pitch related anomaly, the algorithm replaces

    the erroneous pitch values with zeros. The stationary points are obtained by

    processing the pitch contour extracted from the waveform. The pitch extracted

    23

  • Figure 3.3: Original and Cubic Interpolated pitch contours

    is converted to the cent scale using (3.1) to normalise with respect to the tonic of

    different musicians.

    centFrequency = 1200 log2(

    ftonic

    )(3.1)

    Least Squares Fit (LSF)[37] was used to compute the slope of the pitch extracted.

    The zero crossings of the slope correspond to the stationary points (Figure 3.1). A

    Cubic Hermite interpolation[15] was then performed with the initial estimation of

    stationary points to get a continuous curve (Figure 3.3). The stationary points are

    then again estimated from this continuous curve.

    24

  • 3.3 Rough Longest Common Subsequence Algorithm

    Rough Longest Common Subsequence (RLCS), a variant of Longest Common

    Subsequence (LCS), performs an approximate match between a reference sequence

    and a query sequence while retaining the local similarity [30]. It introduces three

    major changes in LCS namely, rough match, width-across-reference (WAR) and

    width-across-query (WAQ) for local similarity and score matrix.

    3.3.1 Rough match

    In the recurrence function of LCS, the cost function is incremented by 1 when there

    is an exact match. In RLCS, when the distance between a reference point, ri, and

    a query point, qi, is less than a threshold, Td, they are said to be roughly matched,

    ri qi i.e. d(ri, q j) < Td ri qi, where d(ri, q j) is the distance between r j and q j.

    The cost is incremented by a number, , between 0 and 1 instead of 1, based on

    how good the match is as shown in (3.2).

    i, j = 1 dri,q jTd

    (3.2)

    The cost is estimated using the following recurrence:

    ci, j =

    0 ; i. j = 0

    ci1, j1 + i, j ; ri q j

    max(ci1, j, c j1,i) ; ri! qi

    (3.3)

    In LCS the cost gives the length of the Longest Common Subsequence. The cost

    of RLCS is not incremented by 1 but it represents the length of the Rough Longest

    25

  • Common Subsequence. Later, it is argued that this length is actually a rough length

    of RLCS rather than its actual length.

    3.3.2 WAR and WAQ for local similarity

    To retain the local similarity, width-across-reference, WAR, and width-across-

    query, WAQ, are used. WAR and WAQ represent the length of the shortest

    substring of the reference and the query respectively, containing the LCS. These

    measures represent the density of LCS in the reference and the query. Small values

    of WAR and WAQ indicate a dense distribution of LCS. WAR is incremented by 1

    if there is a rough match or jump along the reference. Likewise for the WAQ. WAR

    and WAQ are computed using the following recurrences:

    wri, j =

    0 ; i. j = 0

    wri1, j1 + 1 ; ri q j

    wri1, j + 1 ; ri! qi, ci1, j ci, j1

    wri, j1 ; ri! qi, ci1, j < ci, j1

    (3.4)

    wqi, j =

    0 ; i. j = 0

    wqi1, j1 + 1 ; ri q j

    wqi1, j ; ri! qi, ci1, j ci, j1

    wqi, j1 + 1 ; ri! qi, ci1, j < ci, j1

    (3.5)

    In (3.4) and (3.5), some of the cases and conditions are dropped from [30] for

    the sake of clarity.

    26

  • 3.3.3 Score matrix

    WAR, WAQ and cost are used to compute the score of a common subsequence in

    the following way:

    Scorei, j =

    (

    ci, jwri, j

    + (1 ) ci, jwqi, j

    ) ci, jn i f ci, j n

    0 otherwise

    (3.6)

    In (3.6), large value ofci, jwri, j

    suggests that the density of the RLCS is high in the

    reference. Similarly, a large value ofci, jwqi, j

    is indicative of higher density of RLCS in

    the query. weighs between these two ratios. Large value ofci, jn indicates that a

    large part of the query has been matched, where n is the length of the query. is

    the matching rate that represents how long the length of the RLCS should be, with

    respect to the query.

    The algorithm to compute these values using Dynamic Programming is pre-

    sented in [30].

    3.4 Modified-Rough Longest Common Subsequence

    In this section, the modifications made to the existing RLCS algorithm and the

    rationale behind them are discussed.

    27

  • 3.4.1 Rough and actual length of RLCS

    In [30], ci, j is defined as the length of the RLCS. But, it actually represents a rough

    length of RLCS because it is incremented by i, j when there is a rough match. The

    resulting value of ci, j need not be an integer. Therefore, it cannot be the actual

    length of any sequence. The actual length of RLCS is defined by the following

    recurrence:

    cai, j =

    0 ; i. j = 0

    cai1, j1 + 1 ; ri q j

    max(cai1, j, caj1,i) ; ri! qi

    (3.7)

    In (3.7), cost is incremented by 1 on a rough match. In (3.6), while computing score,

    half of the importance is given to the ratio of rough length of RLCS and the query

    length. Instead of just considering how good the rough length of the RLCS with

    respect to the query length is, it is conjectured that it is also important to consider

    how good the rough length of the RLCS is with respect to the actual length of the

    RLCS.ci, j + ci, jcai, j + n

    gives equal importance to both the ratios. This term is similar to the F1 score where

    precision and recall are given equal importance.

    3.4.2 RWAR and RWAQ

    WAR and WAQ represent the width of the shortest substring that contains the

    RLCS. As discussed in the previous subsection, ci, j represents the rough length

    28

  • which is shorter than the actual length of the RLCS. Therefore, it is not clear

    whetherci, jwri, j

    really represents the density of the RLCS in the reference. This term

    also penalizes based on the degree of match, while a penalty has already been

    accounted for in the term,ci, jn . Therefore, a rough width across reference and query

    is required that represents the rough width of the shortest substring containing

    the RLCS. On a rough match, cost is incremented by a i, j. At the same time,

    when a rough match is obtained, the WAR and WAQ are also incremented by i, j

    resulting in Rough WAR (RWAR) and Rough WAQ (RWAQ), respectively. When

    there is no match, RWAR and RWAQ are incremented by 1 whereas the cost is not

    incremented. Therefore, RWAR and RWAQ account for the density of the RLCS

    in the reference and query better. RWAR and RWAQ can be computed by the

    following recurrences:

    rwri, j =

    0 ; i. j = 0

    rwri1, j1 + i, j ; ri q j

    rwri1, j + 1 ; ri! qi, ci1, j ci, j1

    rwri, j1 ; ri! qi, ci1, j < ci, j1

    (3.8)

    rwqi, j =

    0 ; i. j = 0

    rwqi1, j1 + i, j ; ri q j

    rwqi1, j ; ri! qi, ci1, j ci, j1

    rwqi, j1 + 1 ; ri! qi, ci1, j < ci, j1

    (3.9)

    29

  • 3.4.3 Matched rate on the query sequence

    In (3.6), is an empirical parameter that is set based on the required match rate on

    the entire query sequence. The score is updated by a non-zero value, only if the

    rough length of the RLCS is greater than n. It is not clear how to set the value

    of or what it means for the rough length to be greater than a fraction of the query

    length. Instead, it would be better to update the score by a non-zero value if the

    actual length is greater than n. This makes the interpretation clear and makes

    it easy to set the value of .

    The score update of the modified-RLCS is given by the following equation:

    Scorei, j =

    (

    ci, jrwri, j

    + (1 ) ci, jrwqi, j

    ) ci, j+ci, jcai, j+n i f c

    ai, j n

    0 otherwise

    (3.10)

    3.5 A Two-Pass Dynamic Programming Search

    In Section 3.2 it is illustrated that the sequence of stationary points are crucial for

    a motif. Therefore, RLCS is used to query for the stationary points of the given

    motif in the alapana.

    Music matching using LCS methods for western music is performed on sym-

    bolic music data[19]. The musical notes in this context are the symbols. However,

    in the context of Carnatic music, there is no consistent one to one correspondence

    between the notation and the sung melody. Although, in this work, stationary

    points are used instead of a symbolic notation, one must keep in mind that sta-

    tionary points are not symbols but are continuous pitch values. In order to match

    such pitch values, a rough match instead of an exact match is required. A variant

    30

  • of the LCS known as the Rough Longest Common Subsequence [30] allows such a

    rough match.

    In this work, a two pass RLCS matching is performed. In the first pass, the

    stationary points of the reference sequence and the query sequence are matched to

    obtain the candidate motif regions. Nevertheless, given two consecutive stationary

    points, the pitch contour between these two stationary points can be significantly

    different for different phrases. This leads to many false alarms. A second pass of

    RLCS is then performed on the regions obtained from the first pass to filter out the

    false alarms from the true motifs.

    3.5.1 First Pass: Determining Candidate Motif Regions using

    RLCS

    The RLCS algorithm used in this work is illustrated in this section. The alapana is

    first windowed and then processed with the RLCS algorithm. The window size

    chosen for this task is 1.5 times the length of the motif queried for. The matrices

    obtained from the RLCS are then processed as follows:

    From the cells of the score matrix with values greater than a threshold,seqFilterTd, sequences are obtained by tracing the direction matrix backwards.

    The duplicate sequences which may be acquired are neglected, preservingunique sequences of length greater than times the length of reference. Theseare then added to a sequence buffer.

    This process is repeated for every window. The window is shifted by a hopof one stationary point.

    The sequences obtained thus are grouped.

    Each group, taken from the first element of the first member to the lastelement of the last member, represents a potential motif region.

    31

  • 3.5.2 Second Pass: Determining Motifs from the Groups

    In the first pass a matching of only the stationary points is performed. As men-

    tioned above, even though the stationary points are matched it is not necessary

    that the trajectory between them matches. This leads to a large number of false

    alarms. Now that the search space is reduced, the RLCS is performed between the

    entire pitch contour of the potential motif region obtained in the first pass and the

    motif queried. The entire pitch contour is used in order to account for the trajectory

    information contained in the phrases. The threshold Td used for the first pass is

    tightened in this iteration for better precision while matching the entire feature

    vector. In this iteration, the cell of the score matrix having the maximum value is

    chosen and the sequence is traced back using the direction matrix from this cell.

    This sequence is hypothesized to be the motif. The database and experimentation

    are detailed in the following sections.

    3.6 Dataset

    Table 3.1 gives the details of the dataset of alapana used in this work. As mentioned

    above, this task will be performed on alapanas. The motifs are categorized into two

    types based on their durations: short motifs and long motifs. The details of these

    motifs are given in Table 3.2 and Table 3.3. The average duration is obtained from

    the labeled ground truth. The long motifs are inspired by the raga test conducted

    by Rama Verma1. Most people across the globe were able to unambiguously

    determine the identity of ragas using these motifs. An attempt was made to use

    the motifs from Rama Vermas raga test directly. As the recordings are rather noisy,

    1http://www.youtube.com/watch?v=3nRtz9EBfeY

    32

    http://www.youtube.com/watch?v=3nRtz9EBfeY

  • the same motifs were generated by a professional musician. In particular, we have

    chosen only the raga Bhairavi for illustration.

    Table 3.1: Dataset of alapanas

    Raga Number of Number of Average TotalName Alapanas Artists Duration (mins) Duration (mins)Kamboji 27 12 9.73 262.91Bhairavi 16 13 10.65 170.48

    Table 3.2: Short Motifs

    Raga Labeled AverageName Ground Truth Duration (secs)Kamboji 70 1.8837Bhairavi 103 1.3213

    Table 3.3: Long Motifs

    Raga Labeled AverageName Ground Truth Duration (secs)Bhairavi 59 3.18

    3.7 Experiments and Results

    3.7.1 Querying motifs in the alapanas

    RLCS was performed on the dataset of alapanas. The distance function used for

    RLCS is cubic in nature with the equation given below.

    di, j =

    |xiy j|3(3st)3

    i f | xi y j |< 3st

    0 otherwise(3.11)

    33

  • where xi and y j represent pitch values and st represents a semitone in cents. Due

    to different styles of various musicians, an exact match between two pitch values

    contributing to the same svara cannot be expected. Hence, in this work a leeway

    of 3 semitones is allowed between pitch values. Musically two pitch values, 3

    semitones apart, cannot be called similar but this issue is addressed by the cubic

    nature of the similarity function. The function reaches its half value when the

    difference in two symbols is approximately half a semitone. Therefore, lower

    distance values are obtained when the corresponding pitch values are at most half

    a semitone apart. In this work, the phrases sung across octaves are ignored. For

    this experiment the parameters set were as follows: Td = 0.45; = 0.5 ; = 0.8.

    The parameter 0 < < 1 is a user defined parameter that ensures that length of

    the query motif is matched with that of the alapana.

    The details of the number of ground truth motifs retrieved and the total number

    of trues retrieved are given in Table 3.4 and Table 3.5 for short motifs and long motifs

    respectively. The number of false positives, retrieved are however substantial. This

    is affordable since the objective in the first pass is to obtain the maximum number

    of the regions similar to the motif. The second iteration of RLCS is performed to

    filter out the false positives.

    Table 3.4: Short Motifs: Retrieved regions after the first pass

    Raga Total True Precision RecallName Retrieved Retrieved (%) (%)Kamboji 719 58 8.07 82.86Bhairavi 474 91 19.20 88.35

    Now that the candidate motif regions are known, the second pass of RLCS is

    conducted wherein the same motifs are queried in the regions retrieved by the

    first pass. The entire pitch contour of the query and reference are used for this

    34

  • Table 3.5: Long Motifs: Retrieved regions after the first pass

    Raga Total True Precision RecallName Retrieved Retrieved (%) (%)Bhairavi 194 51 26.29 86.44

    task in order to account for the information of trajectory of pitches between the

    stationary points. The requirement of a query motif for such a search is due to

    the scarce rendition of certain characteristic phrases in an alapana. The spotting of

    such phrases proves to be useful to musicians and students for analysis purposes.

    The hits obtained in the second pass are sorted according to the RLCS scores.

    Top 10 hits per alapana are considered to compute the precision and recall. The

    motifs are not exact since they correspond to the extempore enunciation by an

    artist. The relevant motifs are all motif that were marked as true in that alapana.

    The results are illustrated in Table 3.6 and Table 3.7.

    Table 3.6: Short Motifs: Top 10 retrieved motifs after the second pass

    Raga Precision RecallName (%) (%)Kamboji 40.45 76.00Bhairavi 41.25 91.04

    Table 3.7: Long Motifs: Top 10 retrieved motifs after the second pass

    Raga Precision RecallName (%) (%)Bhairavi 31.65 74.58

    35

  • 3.7.2 Comparison between RLCS and Modified-RLCS using longer

    motifs

    Motif spotting is performed using RLCS and modified-RLCS on the dataset of

    alapanas using longer motifs as queries. Td is set to 0.45 in both the methods.

    This is done so that pitch values which are approximately one semitone apart are

    considered as rough match. is set to zero because the best value of could be

    different for both the methods which makes the comparison difficult.

    First Voice Activity Detection (VAD) is performed on the alapanas to get the

    voiced parts. This approximately segments the alapana into phrases. Instead of

    the entire alapana, these voiced regions are used. In the first pass, stationary points

    of the query motif and test alapana are used and the motif regions or groups are

    retrieved along with their scores. Each group either corresponds to a motif or a

    false alarm. A true group consists of one or more true positives. Score distribution

    of the true positive groups and false alarm groups for both the algorithms are

    shown in Figure 3.4. Each score value is subtracted from the mean of scores of the

    false alarms such that mean of the false alarms distribution becomes zero for both

    the algorithms. This enables a better comparison between RLCS and modified-

    RLCS algorithms. The shared region in the score distribution of true positives

    groups and false alarms groups is less in the modified-RLCS algorithm than in the

    RLCS algorithm.

    Motifs are sparsely present in an alapana. Our purpose is to retrieve as many

    motifs as possible. Spotting all or most of the motifs is more crucial than removal

    of all false alarms. Therefore higher penalty is given for missing a motif than

    for a false alarm group. Score threshold is selected from the minimum detection

    cost function for both the algorithms. The sequences whose scores are above the

    36

  • 1 0.5 0 0.5 1

    Density

    Scores1 0.5 0 0.5 1

    Density

    Scores

    a) RLCS b) Modified-RLCS

    Figure 3.4: a) True positive groups and false alarm groups score distribution forRLCS. b) True positive groups and false alarm groups score distribu-tion for modified-RLCS.

    score threshold are preserved. The details of the comparison after the first pass are

    shown in Table 3.8. Modified-RLCS has shown a clear improvement over RLCS in

    terms of false alarms and average duration of true positives and false alarms.

    Table 3.8: Long Motifs: Retrieved regions after the first pass

    Algorithm Total True True Positive False Alarm Precision RecallRetrieved Retrieved Duration (avg.) Duration (avg.) (%) (%)

    RLCS 194 51 9.61 secs 12.73 secs 26.29 86.44Modified-RLCS 151 52 9.00 secs 11.95 secs 34.44 88.14

    These regions are used as the tests in the second pass. In the second pass, tonic

    normalized smoothed pitch contour is used as a feature. The primary objective

    of the second pass is to locate the motifs within a group and remove as many

    false alarm groups as possible. The details of the comparision after the second

    pass are given in Table 3.9. There is a reduction in the number of false alarms in

    both the algorithms but the precision and localization of motifs are still better for

    modified-RLCS.

    37

  • Table 3.9: Long Motifs: Retrieved regions after the second pass

    Algorithm Total True True Positive False Alarm Precision RecallRetrieved Retrieved Duration (avg.) Duration (avg.) (%) (%)

    RLCS 180 50 8.25 secs 11.12 secs 27.78 84.75Modified-RLCS 144 50 7.68 secs 11.00 secs 34.72 84.75

    3.8 Discussion

    From the results obtained in Table 3.6 and Table 3.7, it is clear that even though

    the precision is low, the recall is high in most of the cases. Certain partial matches

    are also obtained where either the first part of the query is matched or the end of

    the query is matched. These are movements similar to those of the phrases and

    are interesting for a listener, learner, or researcher. High scores were obtained for

    certain false alarms. This is primarily due to melodic similarities between the false

    alarm and the original phrase.

    Modified-RLCS results in less false alarms compared to RLCS in both the passes.

    The duration of the hits is also less when compared to that of RLCS which means

    that the localization is better in modified-RLCS. This is primarily due to the fact

    that the actual length of the match is used in the modified-RLCS. In (3.6), the term

    ci, jn

    focuses on getting an RLCS whose rough length is as large as length of query motif

    but in (3.10), the termci, j + ci, jcai, j + n

    gives equal importance to both: getting an RLCS whose rough length is as large

    as the length of the query motif and also on getting an RLCS whose rough length

    38

  • is as large as the actual length of the RLCS. Due to this, shorter sequences also get

    a good score if they represent the motif adequately.

    3.8.1 Importance of VAD in motif spotting

    Voice Activity Detection (VAD) on the alapanas is a very crucial step. This step

    removes the noise and reduces the search space. Table 3.10 shows the results after

    the two passes using the modified-RLCS algorithm. The method for selecting the

    score thresholds after both the passes remains the same. The groups have become

    much longer though the number of true positive groups and false alarm groups

    has reduced. Some of the true positive groups have more than one instance of the

    motif. Therefore, the number of true positives are much more than the number

    of true positive groups. But they are not at all localized properly even after the

    second pass. The number of false alarm groups is also very less but the duration is

    very high, approximately 1 minute. The total duration of the false alarms without

    VAD is much more than that of those with VAD after each of the passes. This

    vindicates the use of VAD.

    Table 3.10: Retrieved Groups after both the passes for modified-RLCS withoutVAD

    Pass No. True Group True Positive True Group False Group False Group

    Retrieved Retrieved Duration (avg.) Retrieved Duration (avg.)

    Pass 1 29 57 80.31 secs 20 68.23 secs

    Pass 2 38 57 44.33 secs 31 45.45 secs

    39

  • 3.9 Summary

    In this work, RLCS is used for motif spotting in alapanas in Carnatic music. It is

    illustrated that the stationary points of the pitch contour of a musical piece hold

    significant music information. It is then shown that quantizing the pitch contour of

    the alapana at the stationary points leads to no loss of information while it results

    in a significant reduction in the search space. The RLCS method is shown to

    give a high recall for the motif queried. Given that the objective is to explore the

    musical traits of a raga by spotting interesting melodic motifs rendered by various

    artists, the recall of the motif queried is of higher importance than the precision. A

    modified version of RLCS algorithm is also presented which gives better scores for

    sub-sequences which are shorter than the query but have matched reasonably well

    with most of the query. Modified-RLCS was tested on longer motifs and compared

    favorably with the original RLCS. The importance of performing Voice Activity

    Detection is also discussed.

    40

  • CHAPTER 4

    Motif Discovery

    4.1 Introduction

    A raga in Carnatic music is characterised by typical phrases or motifs. They are

    primarily pitch trajectories in the time-frequency plane. Although, for annotation

    purposes, ragas in Carnatic music are based on 12 srutis (or semitones), the gamakas

    associated with the same semitone can vary significantly across ragas as discussed

    in Chapter 2. Nevertheless, although the phrases do not occupy fixed positions in

    the time-frequency (t-f) plane, an experienced listener can determine the identity

    of a raga within few seconds of an alapana. The objective of the presented work here

    is to determine typical motifs of a raga automatically. This is obtained by analyzing

    various compositions that are composed in a particular raga. Unlike Hindustani

    music, there is a huge repository of compositions by a number of composers in

    different ragas. It is often stated by musicians that the famous composers have

    composed such that a single line of a composition is replete with the motifs of the

    raga. In this work, we therefore take single lines of different compositions and

    determine the typical motifs of the raga.

    In a Carnatic music concert, many listeners from the audience are able to

    identify the raga at the very beginning of the composition, usually during the

    singing of the first line itself a line corresponds to one or more tala cycles.

    Thus, first lines of the compositions could contain typical motifs of a raga. A

    pattern which is repeated within a first line could still be not specific to a raga.

  • Figure 4.1: RLCS matching two sequences partially

    Whereas, a pattern that is present across different composition lines could be a

    typical motif of that raga. Instead of just using first lines, we have also used other

    lines of compositions, namely, the lines from pallavi, anupallavi and charanam. In

    this chapter, an attempt is made to find repeating patterns across these lines and

    not within a line. Typical motifs are filtered from the generated repeating patterns

    during post processing. These typical motifs are available online 1.

    The length of the typical motif to be discovered is not known a priori. Therefore

    there is a need for a technique which can itself determine the length of the motif

    at the time of discovering it. Dynamic Time Warping (DTW) based algorithms can

    only find a pattern of a specific length since it performs end-to-end matching of the

    query and test sequence. There is another version of DTW known as Unconstrained

    End Point-DTW (UE-DTW) that can match the whole query with a partial test

    1http://www.iitm.ac.in/donlab/typicalmotifs.html

    42

    http://www.iitm.ac.in/donlab/typicalmotifs.html

  • but still the query is not partially matched. Longest Common Subsequence (LCS)

    algorithm on the other hand can match the partial query with partial test sequences

    since it looks for a Longest Common Subsequence which need not be end-to-end.

    LCS by itself is not appropriate as it requires discrete symbols and does not account

    for local similarity. A modified version of LCS known as Rough Longest Common

    Subsequence takes continuous symbols and takes into account the local similarity

    of the Longest Common Subsequence. The algorithm proposed in [30] to find

    Rough Longest Common Subsequence between two sequences fits the bill for the

    task of motif discovery. An example of RLCS algorithm matching two partial

    phrases is shown in Figure 4.1. The two music segments are represented by their

    tonic normalized pitch contours. The stationary points, where the first derivative

    is zero, of the tonic normalized pitch contour are first determined. The points are

    then interpolated using cubic Hermite interpolation to smooth the contour.

    In Chapter 3, plenty of false alarms were observed. One of the most prevalent

    false alarms was found to be, due to a sustained note appearing in the phrase. The

    slope of the linear trend in stationary points along with its standard deviation is

    used to address this issue.

    The rest of the chapter is organized as follows. In Section 4.2 the use of com-

    position lines to find motifs is discussed. Section 4.3 discusses the optimization

    criteria to find the Rough Longest Common Subsequence. Section 4.4 describes

    the proposed approach for discovering typical motifs of ragas. Section 4.5 describe

    the dataset used in this work. Experiments and results are presented in Section

    4.6.

    43

  • 4.2 Lines from the compositions

    As previously mentioned, first line of the composition contains the characteristic

    traits of a raga. The importance of the first lines and the raga information it holds is

    illustrated in great detail in the T. M. Krishnas book on Carnatic music [26]. T. M.

    Krishna states that opening section called pallavi (discussed in Chapter 2) directs

    the melodic flow of the raga and through its rendition, the texture of the raga can

    be felt. Motivated by this observation, an attempt is made to verify the conjecture

    that typical motifs of a raga can be obtained from the first lines of compositions.

    Along with the lines from pallavi, we have also selected few lines from other

    sections, namely, anupallavi and charanam. Anupallavi comes after pallavi and the

    melodic movements in this section tend to explore the raga in the higher reaches

    of the octave as discussed in Chapter 2.

    4.3 Optimization criteria to find Rough Longest Com-

    mon Subsequence

    The Rough Longest Common Subsequence (RLCS) between two sequences, X =

    x1, x2, , xn and Y = y1, y2, , ym, of length n and m is defined as the Longest

    Common Subsequence (LCS) ZXY = (xi1 , y j1), (xi2 , y j2), , (xip , y jp), 1 i1 < i2 ci, j1 then11: di, j 12: ci, j max(ci1, j , 0)13: if di1, j = then14: si, j

    ai1, jp2+c2i1, jp2

    15: else16: si, j si1, j17: else18: di, j 19: ci, j max(ci, j1 , 0)20: if di, j1 = then21: si, j

    ai, j1p2+c2i, j1p2

    22: else23: si, j si, j124: q max(ai1, j1, ai1, j, ai, j1)25: if q = 1 and di, j = then26: ai, j si1, j127: else if ci, j < rc then28: ai, j 129: else30: ai, j q

    5.4 Raga Verification

    Let Traga ={t1, t2, , tNraga

    }represent a set of template recordings, where raga

    refers to the name of the raga and Nraga is the total number of templates for that

    64

  • raga. During testing, an input test recording, X, with a claim is tested against all

    the template recordings of the claimed raga. The final score is computed as given

    in (5.7).

    score (X, claim) = maxYTclaim

    (score (lcssXY)) (5.7)

    The final decision, of accepting or rejecting the claim, directly based on this score

    could be erroneous. Score normalisation with cohorts is essential to make a deci-

    sion, especially when the difference between two ragas is subtle.

    5.4.1 Score Normalization

    LCSS scores corresponding to correct and incorrect claims are referred as true and

    imposter scores, respectively. If the imposter is a cohort raga, then the imposter

    score is also referred as cohort score. Various score normalization techniques are

    discussed in the literature for speech recognition, speaker/language verification

    and spoken term detection [1, 34].

    Zero normalization (Z-norm) uses the mean and variance estimate of cohort

    scores for scaling. The advantage of Z-norm is that the normalization parameters

    can be estimated off-line. Template recordings of a raga are tested against template

    recordings of its cohorts and the resulting scores are used to estimate a raga specific

    mean and variance for the imposter distribution. The normalized scores using Z-

    norm can be calculated as

    scorenorm

    (X, claim) =score (X, claim) claimI

    claimI(5.8)

    65

  • where claimI and claimI are the estimated imposter parameters for the claimed raga.

    Test normalization (T-norm) is also based on a mean and variance estimation

    of cohort scores for scaling. The normalization parameters in T-norm are estimated

    online as compared to their offline estimation in Z-norm. During testing, a test

    recording is tested against template recordings of cohort ragas and the resulting

    scores are used to estimate mean and variance parameters. These parameters are

    then used to perform the normalization given by (5.8).

    The test recordings of a raga may be scored differently against templates corre-

    sponding to the same raga or imposter raga. This can cause overlap between the

    true and imposter score distributions. T-norm attempts to reduce this overlap.

    The templates that are stored and the audio clip that is used during test can be

    from different environments.

    5.5 Performance evaluation

    In this section, we describe the results of raga verification using LCSS algorithm

    in comparison with Rough Longest Common Subsequence (RLCS) algorithm [30]

    and Dynamic Time Warping (DTW) algorithm using different normalizations.

    5.5.1 Experimental configuration

    Only 17 ragas out of 30 were used for raga verification as only for 17 ragas sufficient

    number of relevant cohorts could be obtained from the 30 ragas. This is due to non-

    symmetric nature of the cohorts as discussed in Section 5.2. For raga verification,

    40% of the pallavi lines are used as templates and remaining 60% are used for

    66

  • Table 5.2: EER(%) for different algorithms using different normalizations on dif-ferent datasets.

    Algorithm Dataset No Norm Z-norm T-NormDTW D1 27.78 29.88 17.45

    D2 40.81 40.03 35.96RLCS D1 24.43 27.22 14.87

    D2 41.72 42.58 41.20RLCS-MOD D1 20.88 22.72 13.25

    D2 36.72 37.68 34.58LCSS (hard) D1 29.00 31.75 15.65

    D2 40.28 40.99 34.11LCSS (soft) D1 21.89 24.11 12.01

    D2 37.24 38.96 34.57

    testing. This partitioning of dataset is done into two ways, referred as D1 and D2.

    In D1, the variations of a pallavi line might fall into both templates and test though

    it is not necessary. Variations of a pallavi line are different from the pallavi line due

    to improvisations. In D2, these variations can either belong to template or they all

    belong to test but strictly not present in both. The values of thresholds sim and rc

    are empirically chosen as 0.45 and 0.5, respectively. Penalty, , issued for gaps in

    segments is empirically chosen as 0.5.

    5.5.2 Results

    Table 5.2 and Figure 5.2 show the comparison of LCSS with DTW and RLCS using

    different normalizations. Equal Error rate (EER) refers to a point where false alarm

    rate and miss rate is equal. For T-norm, the best 20 cohort scores were used for

    normalization. LCSS (soft) with T-norm performs best for D1 around the EER

    point, and for high miss rates and low false alarms, whereas it performs poorer

    than LCSS (hard) for low miss rates and high false alarms. This behavior appears

    to be reversed for D2. The magnitude around EER is much greater for D2. This

    67

  • 0.001 0.01 0.1 0.2 0.5 1 2 5 10 20 40

    0.1

    0.2

    0.5

    1

    2

    5

    10

    20

    40

    80

    False Alarm probability (in %)

    Mis

    s p

    rob

    ab

    ility

    (in

    %)

    nonorm lcss (soft)

    znorm lcss (soft)

    tnorm lcss (soft)

    nonorm lcss (hard)

    znorm lcss (hard)

    tnorm lcss (hard)

    nonorm rlcs

    znorm rlcs

    tnorm rlcs

    nonorm dtw

    znorm dtw

    tnorm dtw

    0.001 0.01 0.1 0.2 0.5 1 2 5 10 20 40

    0.1

    0.2

    0.5

    1

    2

    5

    10

    20

    40

    80

    False Alarm probability (in %)

    Mis

    s p

    rob

    ab

    ility

    (in

    %)

    nonorm lcss (soft)

    znorm lcss (soft)

    tnorm lcss (soft)

    nonorm lcss (hard)

    znorm lcss (hard)

    tnorm lcss (hard)

    nonorm rlcs

    znorm rlcs

    tnorm rlcs

    nonorm dtw

    znorm dtw

    tnorm dtw

    a) DET curves for dataset D1 b) DET curves for dataset D2

    Figure 5.2: DET curves comparing LCSS algorithm with different algorithms usingdifferent score normalizations

    is because, none of the variations of the pallavi lines in test are present in the

    templates. It is also shown that RLCS performs poorer than any other algorithms

    for D2. The curves also show no improvements for Z-norm compared to baseline

    with no normalization. This can happen due to the way normalization parameters

    are estimated for Z-norm. For example, some of the templates, which may not be

    similar to the test, can be similar to some of the cohorts templates, resulting in

    higher mean. This would not have happened in T-norm where the test itself is

    tested against the cohorts templates.

    5.6 Discussion

    In this section, we discuss how LCSS (hard) and LCSS (soft) can be combined

    to achieve better performance. We also verify that T-norm reduces the overlap

    between true and imposter scores.

    68

  • Table 5.3: Number of claims correctly verified by hard-LCSS only, by soft-LCSSonly, by both and by neither of them for D1 and D2 using T-norm

    Dataset Claim- Hard- Soft- Both Neithertype only only

    D1 True 23 55 289 77False 46 78 1745 54

    D2 True 47 23 155 220False 99 75 1585 168

    5.6.1 Combining hard-LCSS and soft-LCSS

    Instead of selecting a threshold, we will assume that a true claim is correctly

    verified when its score is greater than all the cohort scores. Similarly, a false claim

    is correctly verified when its score is lesser than atleast one of the cohort scores.

    Table 5.3 shows the number of claims correctly verified only by hard-LCSS, only

    by soft-LCSS, by both and by neither of them. It is clear that there is an overlap

    between the correctly verified claims of hard-LCSS and soft-LCSS. Nonetheless,

    the number of claims distinctly verified by both is also significant. Therefore, the

    combination of these two algorithms could result in a better performance.

    5.6.2 Reduction of overlap in score distribution by T-norm

    Figure 5.3 shows the effect of T-norm on the distribution of hard-LCSS scores. It is

    clearly seen that the overlap, between the true and imposter score distributions, is

    reduced significantly. For visualization purposes, the true score distributions are

    scaled to zero mean and unit variance and corresponding imposter score distribu-

    tions are scaled appropriately.

    69

  • 5 4 3 2 1 0 1 2 3 4 50

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    LCSS (hard) scores without normalization

    Den

    sity

    True scores

    Imposter scores

    5 4 3 2 1 0 1 2 3 4 50

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    LCSS (hard) scores with tnorm

    Den

    sity

    True scores

    Imposter scores

    Figure 5.3: Showing the effect of T-norm on the score distribution

    5.6.3 Scalability of raga verification

    The verification of a raga depends on the number of its cohort ragas which are

    usually 4 or 5. Since it does not depend on all the ragas in the dataset, as in raga

    identification, any number of ragas can be added to the dataset.

    5.7 Summary

    In this Chapter, we have presented a different approach to raga analysis in Carnatic

    music. Instead of raga identification, raga verification is performed. A set of cohorts

    for every raga is defined. The identity of an audio clip is presented with a claim.

    The claimed raga is verified by comparing with the templates of the claimed raga

    and its cohorts by using a novel approach, Longest Common Segment Set (LCSS).

    A set of 17 ragas and its cohorts constituting 30 ragas is tested using appropriate

    score normalization techniques. An equal error rate of about 12% is achieved. This

    approach is scalable to any number of ragas as the given raga and only its cohorts

    need to be added to the system.

    70

  • CHAPTER 6

    Conclusion

    Typical motifs of a raga are used to establish its identity in all improvisational and

    compositional forms. Along with raga identity, typical motifs can also be used to

    index a recording for archival purposes. Further, indexed motifs can also be used

    to explore and to analyze the melodic phrases connecting them, which could be

    useful for both listeners and learners of Carnatic music.

    The objective of this thesis was to develop algorithmic techniques for automatic

    extraction of typical motifs and for performing raga verification using the regions

    replete with typical motifs. Some of the salient points presented in this thesis are

    as follows:

    6.1 Salient Points

    It was shown using pitch histograms that the notes in Carnatic music havegreater pitch range as compared to Hindustani music and Western classicalmusic. This renders the symbolic representation of Carnatic music a non-trivial task and poses significant challenges in the analysis of Carnatic music.

    The stationary points of the pitch contour were shown to preserve the es-sential raga information however, the exact melodic information was lost.For the task of finding different renditions of typical motifs, these stationarypoints were used to reduce the search space. A measure based on the slopeof the linear trend in stationary points along with its standard deviation isused to reduce the false alarms.

    An algorithm was proposed for time-series matching which is a modificationof an existing algorithm known as Rough Longest Common Subsequence.This algorithm can match shorter sequences that are common between twolonger sequences. However, the score was penalized with respect to the

  • length of the longer sequences. Therefore, matched shorter sequences canget low scores suggesting that the match is poor even when the match isgood.

    The second algorithm, known as Longest Common Segment Set, proposedfor time-series matching was novel. It can also match shorter sequences thatare common between two longer sequences but the score was not penalizedwith respect to the longer sequences. Therefore, it was more effective in theextraction of the common shorter sequences.

    Typical motifs of duration of approximately four seconds were found to bemore relevant for raga identity. Shorter motifs had less context and resultedin great deal of false alarms.

    Typical motifs were found to be prevalent in the pallavi lines of the composi-tions. Therefore, these pallavi lines were used in the task of raga verification.

    In raga verification, cohort ragas (usually four or five ragas) were used fornormalizing the score instead of all the ragas in the dataset. Therefore, theproposed raga verification system was found to be scalable to any number ofragas. For a new raga to be added into the system, only the templates of thenew raga and its cohorts were required without altering the existing system.

    6.2 Criticism of the work

    In this section, we discuss the shortcomings of the approaches proposed in this

    thesis.

    The proposed algorithms for time-series matching requires that the orderingof the common shorter sequences is same in both the longer sequences. Ifthe ordering is different, all the common shorter sequences are not matched.

    The algorithms also fail to match sequences if they are in different octaves.

    The performance of the algorithms is also sensitive to pitch errors. Thisproblem is dealt to some extent by smoothing the pitch contours if the pitcherrors are not significantly large.

    Typical motifs are retrieved only if they repeat across the composition lines.Therefore, this approach relies on large number of composition lines.

    Raga verification also needs large number of composition lines (templates)such that most of the typical motifs are represented.

    72

  • 6.3 Future work

    Given the drawbacks of the proposed approaches in the previous section, the

    following improvements can be made:

    For time-series matching, when the ordering of common shorter sequencesis different, no single alignment can align all the common shorter sequences.In such situations different alignments can be inspected to extract all thecommon shorter sequences irrespective of their order.

    For matching sequences that belong to different octaves, one of the two se-quences can be shifted to different octaves and the matching can be performedwith all the shifted sequences.

    Instead of using pitch to represent the melody, a transformation of the fre-quency spectrum can be used that reduces other noises and preserves themelody. This will help in improving the performance of the algorithm whichis sensitive to pitch errors.

    73

  • LIST OF PAPERS BASED ON THESIS

    1. Shrey Dutta, Krishnaraj Sekhar PV and Hema A. Murthy. Raga Verificationin Carnatic Music using Longest Common Segment Set. In Proceedings of16th International Society for Music Information Retrieval Conference, 2015.

    2. Shrey Dutta and Hema A. Murthy. Discovering Typical Motifs of a Raga fromOne-Liners of Songs in Carnatic Music. In Proceedings of 15th InternationalSociety for Music Information Retrieval Conference, pages 397402, 2014.

    3. Shrey Dutta and Hema A. Murthy. A modified rough longest commonsubsequence algorithm for motif spotting in an Alapana of Carnatic Music.In 20th National Conference on Communications (NCC), pages 16, 2014.

    4. Vignesh Ishwar, Shrey Dutta, Ashwin Bellur and Hema A. Murthy. MotifSpotting in an Alapana in Carnatic Music. In Proceedings of 14th InternationalSociety for Music Information Retrieval Conference, pages 499504, 2013

    74

  • REFERENCES

    [1] Roland Auckenthaler, Michael Carey, and Harvey Lloyd-Thomas. Score normal-ization for text-independent speaker verification systems. Digital Signal Processing,10:4254, 2000.

    [2] Ashwin Bellur and Hema A Murthy. A cepstrum based approach for identifying tonicpitch in indian classical music. In National Conference on Communications, pages 15,2013.

    [3] Yueguo Chen, Mario A. Nascimento, Beng Chin, Ooi Anthony, and K. H. Tung. Spade:On shape-based pattern detection in streaming time series. In International Conferenceon Data Engineering, pages 786795, 2007.

    [4] Bill Chiu, Eamonn Keogh, and Stefano Lonardi. Probabilistic discovery of time seriesmotifs. In Proceedings of the Ninth ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining, pages 493498, 2003.

    [5] P Chordia and A Rae. Raag recognition using pitch- class and pitch-class dyaddistributions. In Proceedings of International Society for Music Information RetrievalConference, ISMIR, pages 431436, 2007.

    [6] Tom Collins, Andreas Arzt, Sebastian Flossmann, and Gerhard Widmer. Siarct-cfp:Improving precision and the discovery of inexact musical patterns in point-set repre-sentations. In Internation Society for Music Information Retrieval, pages 549554, 2013.

    [7] Tom Collins, Jeremy Thurlow, Robin Laney, Alistair Willis, and Paul H. Garthwaite.A comparative evaluation of algorithms for discovering translational patterns inbaroque keyboard works. In International Society for Music Information Retrieval, pages38, 2010.

    [8] Darrell Conklin. Discovery of distinctive patterns in music. Intelligent Data Analysis,pages 547554, 2010.

    [9] Darrell Conklin. Distinctive patterns in the first movement of brahms string quartetin c minor. Journal of Mathematics and Music, 4(2):8592, 2010.

    [10] Jonathan D. Cryer and Kung-Sik Chan. Time Series Analysis: with Applications in R.Springer, 2008.

    [11] Pranay Dighe, Parul Agarwal, Harish Karnick, Siddartha Thota, and Bhiksha Raj.Scale independent raga identification using chromagram patterns and swara basedfeatures. In 2013 IEEE International Conference on Multimedia and Expo Workshops, SanJose, CA, USA, July 15-19, 2013, pages 14, 2013.

    [12] Pranay Dighe, Harish Karnick, and Bhiksha Raj. Swara histogram based structuralanalysis and identification of indian classical ragas. In Proceedings of the 14th Inter-national Society for Music Information Retrieval Conference, ISMIR 2013, Curitiba, Brazil,November 4-8, 2013, pages 3540, 2013.

    75

  • [13] Subbarama Dikshitulu. Sangita sampradaya pradarsini. The Music Academy Madras,Vol. 2, 2011.

    [14] D.P.W. Ellis and G.E. Poliner. Identifying cover songs with chroma features anddynamic programming beat tracking. In Proceedings of IEEE International Conferenceon Acoustics, Speech and Signal Processing, volume 4, pages 14291432, 2007.

    [15] F. N. Fritsch and R. E. Carlson. Monotone Piecewise Cubic Interpolation. SIAM Journalon Numerical Analysis, Vol. 17, No. 2., 1980.

    [16] Toni Giorgino. Computing and visualizing dynamic time warping alignments in R:The dtw package. Journal of Statistical Software, 31(7):124, 2009.

    [17] AnYuan Guo and Hava Siegelmann. Time-warped longest common subsequencealgorithm for music retrieval. In Proceedings of 5th International Conference on MusicInformation Retrieval (ISMIR), 2004. http://works.bepress.com/hava_siegelmann/13.

    [18] S Arthi H G Ranjani and T V Sreenivas. Shadja, swara identification and raga veri-fication in alapana using stochastic models. In 2011 IEEE Workshop on Applications ofSignal Processing to Audio and Acoustics (WASPAA), pages 2932, 2011.

    [19] F Scholer I S H Suyoto, A L Uitdenbogerd. Searching musical audio using symbolicqueries audio, speech, and language processing. IEEE Transactions on In Audio, Speech,and Language Processing, IEEE Transactions on, Vol. 16, No. 2., pages 372381, 2008.

    [20] Vignesh Ishwar, Ashwin Bellur, and Hema A Murthy. Motivic analysis and its rel-evance to raga identification in carnatic music. In Workshop on Computer Music,Instanbul, Turkey, July 2012. http://compmusic.upf.edu/publications.

    [21] M Miron J Serra, G K Koduri and X Serra. Tuning of sung indian classical music. InProceedings of the 12th International Society for Music Information Retrieval Conference,ISMIR, pages 157162, 2011.

    [22] Berit Janssen, W. Bas de Haas, Anja Volk, and Peter van Kranenburg. Discoveringrepeated patterns in music: state of knowledge, challenges, perspectives. InternationalSymposium on Computer Music Modeling and Retrieval (CMMR), pages 225240, 2013.

    [23] Gopala Krishna Koduri, Sankalp Gulati, and Preeti Rao. A survey of raaga recognitiontechniques and improvements to the state-of-the-art. Sound and Music Computing,2011.

    [24] Gopala Krishna Koduri, Sankalp Gulati, Preeti Rao, and Xavier Serra. Raga recogni-tion based on pitch distribution methods. Journal of New Music Research, 41(4):337350,2012.

    [25] A.S. Krishna, P.V. Rajkumar, K.P. Saishankar, and M. John. Identification of carnaticraagas using hidden markov models. In Applied Machine Intelligence and Informatics(SAMI), 2011 IEEE 9th International Symposium on, pages 107 110, jan. 2011.

    [26] T. M. Krishna. A Southern Music: The Karnatic Story, chapter 5. HarperCollins, India,2013.

    [27] T M Krishna and Vignesh Ishwar. Carnatic music : Svara, gamaka, motif andraga identity. In Workshop on Computer Music, Instanbul, Turkey, July 2012. http://compmusic.upf.edu/publications.

    76

    http://works.bepress.com/hava_siegelmann/13http://works.bepress.com/hava_siegelmann/13http://compmusic.upf.edu/publicationshttp://compmusic.upf.edu/publicationshttp://compmusic.upf.edu/publications

  • [28] A Krishnaswamy. Application of pitch tracking to south in- dian classical music. InProc. of the IEEE Int. Conf. on Acous- tics, Speech and Signal Processing (ICASSP), pages557560, 2003.

    [29] V. Kumar, H. Pandya, and C.V. Jawahar. Identifying ragas in indian music. In 22ndInternational Conference on Pattern Recognition (ICPR), pages 767772, 2014.

    [30] Hwei-Jen Lin, Hung-Hsuan Wu, and Chun-Wei Wang. Music matching based onrough longest common subsequence. Journal of Information Science and Engineering,pages 95110, 2011.

    [31] Lie Lu, Muyuan Wang, and Hong-Jiang Zhang. Repeating pattern discovery andstructure analysis from acoustic music data. In Proceedings of the 6th ACM SIGMMInternational Workshop on Multimedia Information Retrieval, pages 275282, 2004.

    [32] David Meredith, Kjell Lemstrom, and Geraint A. Wiggins. Algorithms for discoveringrepeated patterns in multidimensional representations of polyphonic music. Journalof New Music Research, pages 321345, 2002.

    [33] Meinard Muller, Frank Kurth, and Michael Clausen. Audio matching via chroma-based statistical features. In Proceedings of International Society for Music InformationRetrieval (ISMIR), pages 288295, 2005.

    [34] Jiri Navratil and David Klusacek. On linear dets. In Proceedings of the IEEE InternationalConference on Acoustics, Speech, and Signal Processing, pages 229232, 2007.

    [35] Gaurav Pandey, Chaitanya Mishra, and Paul Ipe. Tansen: A system for automaticraga identification. In Indian International Conference on Artificial Intelligence, pages13501363, 2003.

    [36] Pranav Patel, Eamonn Keogh, Jessica Lin, and Stefano Lonardi. Mining motifs inmassive time series databases. In Proceedings of IEEE International Conference on DataMining (ICDM02), pages 370377, 2002.

    [37] William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery.Numerical Recipes in C: The Art of Scientific Computing. Second Edition. Oxford UniversityPress, 1992.

    [38] P. Rao, J. Ch. Ross, K. K. Ganguli, V. Pandit, V. Ishwar, A. Bellur, , and H. A. Murthy.Melodic motivic analysis of indian music. Journal of New Music Research, 43(1):115131,2014.

    [39] Joe Cheri Ross, Vinutha T. P., and Preeti Rao. Detecting melodic motifs from audiofor hindustani classical music. In Proceedings of 13th International Society for MusicInformation Retrieval (ISMIR), pages 193198, 2012.

    [40] Joe Cheri Ross and Preeti Rao. Detection of raga-characteristic phrases from hindus-tani classical music audio. Workshop on Computer Music, 2012. http://compmusic.upf.edu/publications.

    [41] Justin Salamon and Emilia Gomez. Melody extraction from polyphonic music signalsusing pitch contours characteristics. In IEEE Transactions on Audio Speech and LanguageProcessing, 20(6):17591770, August 2012.

    77

    http://compmusic.upf.edu/publicationshttp://compmusic.upf.edu/publications

  • [42] Sridharan Sankaran, Krishnaraj P V, and Hema A Murthy. Automatic segmentationof composition in carnatic music using time-frequency cfcc templates. In Proceedingsof 11th International Symposium on Computer Music Multidisciplinary Research (CMMR),2015.

    [43] J. Serra, E. Gomez, P. Herrera, and X. Serra. Chroma binary similarity and localalignment applied to cover song identification. Audio, Speech, and Language Processing,IEEE Transactions on, 16(6):11381151, Aug 2008.

    [44] Joan Serra, Gopala K. Koduri, Marius Miron, and Xavier Serra. Assessing the tuningof sung indian classical music. In Proceedings of the 12th International Society for MusicInformation Retrieval Conference, ISMIR 2011, Miami, Florida, USA, October 24-28, 2011,pages 157162, 2011.

    [45] Sankalp Gulati Joan Serra and Xavier Serra. An evaluation of methodologies formelodic similarity in audio recordings of indian art music. In Proceedings of the 40thIEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2015,pages 678682, April 2015.

    [46] Surendra Shetty. Raga mining of indian music by extracting arohana-avarohanapattern. International Journal of Recent trends in Engineering, 1(1), 2009.

    [47] Rajeswari Sridhar and Tv Geetha. Raga identification of carnatic music for musicinformation retrieval. International Journal of Recent trends in Engineering, 1(1):14,2009.

    [48] M Subramanian. Carnatic ragam thodi pitch analysis of notes and gamakams. Journalof the Sangeet Natak Akademi, XLI(1):328, 2007.

    [49] D Swathi. Analysis of carnatic music: A signal processing perspective. M.Tech. Thesis,IIT Madras, 2009.

    [50] Alexandra L. Uitdenbogerd and Justin Zobel. Manipulation of music for melodymatching. MULTIMEDIA 98 Proceedings of the sixth ACM international conference onMultimedia, pages 235240, 1998.

    78

    ACKNOWLEDGEMENTSABSTRACTLIST OF TABLESLIST OF FIGURESABBREVIATIONSNOTATIONIntroductionOverview of the thesisContribution of the thesisOrganization of the thesis

    Literature SurveyMotif SpottingIntroductionStationary PointsMethod of obtaining Stationary Points

    Rough Longest Common Subsequence AlgorithmRough matchWAR and WAQ for local similarityScore matrix

    Modified-Rough Longest Common SubsequenceRough and actual length of RLCSRWAR and RWAQMatched rate on the query sequence

    A Two-Pass Dynamic Programming SearchFirst Pass: Determining Candidate Motif Regions using RLCSSecond Pass: Determining Motifs from the Groups

    DatasetExperiments and ResultsQuerying motifs in the alapanasComparison between RLCS and Modified-RLCS using longer motifs

    DiscussionImportance of VAD in motif spotting

    Summary

    Motif DiscoveryIntroductionLines from the compositionsOptimization criteria to find Rough Longest Common SubsequenceDensity of the matchNormalized weighted lengthLinear trend in stationary points

    Discovering typical motifs of ragasFiltering to get typical motifs of a raga

    DatasetExperiments and resultsSummary

    Raga VerificationIntroductionDataset usedExtraction of pallavi linesSelection of cohorts

    Longest Common Segment Set AlgorithmCommon segmentsCommon segment setLongest Common Segment Set

    Raga VerificationScore Normalization

    Performance evaluationExperimental configurationResults

    DiscussionCombining hard-LCSS and soft-LCSSReduction of overlap in score distribution by T-normScalability of raga verification

    Summary

    ConclusionSalient PointsCriticism of the workFuture work

Recommended

View more >