an efficient score alignment algorithm and its applications

90
An Ecient Score Alignment Algorithm and its Applications by Emily H. Zhang B.S., Massachusetts Institute of Technology (2015) Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master of Engineering in Computer Science and Engineering at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2017 c Massachusetts Institute of Technology 2017. All rights reserved. Author ................................................................ Department of Electrical Engineering and Computer Science May 26, 2017 Certified by ............................................................ Prof. Michael S. Cuthbert Associate Professor Thesis Supervisor Accepted by ........................................................... Prof. Christopher J. Terman Chairman, Master of Engineering Thesis Committee

Upload: others

Post on 15-Oct-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Efficient Score Alignment Algorithm and its Applications

An Efficient Score Alignment Algorithm and itsApplications

by

Emily H. Zhang

B.S., Massachusetts Institute of Technology (2015)

Submitted to the Department of Electrical Engineering and ComputerScience

in partial fulfillment of the requirements for the degree of

Master of Engineering in Computer Science and Engineering

at the

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

June 2017

c� Massachusetts Institute of Technology 2017. All rights reserved.

Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Department of Electrical Engineering and Computer Science

May 26, 2017

Certified by. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Prof. Michael S. Cuthbert

Associate ProfessorThesis Supervisor

Accepted by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Prof. Christopher J. Terman

Chairman, Master of Engineering Thesis Committee

Page 2: An Efficient Score Alignment Algorithm and its Applications

2

Page 3: An Efficient Score Alignment Algorithm and its Applications

An Efficient Score Alignment Algorithm and its Applications

by

Emily H. Zhang

Submitted to the Department of Electrical Engineering and Computer Scienceon May 26, 2017, in partial fulfillment of the

requirements for the degree ofMaster of Engineering in Computer Science and Engineering

Abstract

String alignment and comparison in Computer Science is a well-explored space withclassic problems such as Longest Common Subsequence that have practical applicationin bioinformatic genomic sequencing and data comparison in revision control systems.In the field of musicology, score alignment and comparison is a problem with manysimilarities to string comparison and alignment but also vast differences. In particularwe can use ideas in string alignment and comparison to compare a music score in theMIDI format with a music score generated from Optical Musical Recognition (OMR),both of which have incomplete or wrong information, and correct errors that wereintroduced in the OMR process to create an improved third score. This thesis createsa set of algorithms that align and compare MIDI and OMR music scores to produce acorrected version of the OMR score that borrows ideas from classic computer sciencestring comparison and alignment algorithm but also incorporates optimizations andheuristics from music theory.

Thesis Supervisor: Prof. Michael S. CuthbertTitle: Associate Professor

3

Page 4: An Efficient Score Alignment Algorithm and its Applications

4

Page 5: An Efficient Score Alignment Algorithm and its Applications

Acknowledgments

I’d like to thank my advisor, Professor Michael Cuthbert for not only advising me on

this project but also for providing invaluable feedback and encouragement throughout.

I am extremely grateful that I had the opportunity to combine my love of computer

science and music by contributing to the music21 project. It has been a rewarding

experience working with you.

I’d like to thank the Music and Theater Arts Department for its support in both my

musical education and for giving me so many opportunities to explore music from

many different approaches, my thesis work being one of them.

Thank you to all who provided me with moral support, feedback, and food during this

project, including Maryam K., Cathie Y., Priya P., Allison Z., Colin M., Jonathan M.,

Peter G., Katie G., Dan P., Michelle Q., Brian M., Harry B., and Katie L.

Finally, I’d like to my family, especially my sister Allison, whose unwavering support

have helped me succeed in my endeavors at MIT.

5

Page 6: An Efficient Score Alignment Algorithm and its Applications

6

Page 7: An Efficient Score Alignment Algorithm and its Applications

Contents

1 Introduction 15

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.1.1 Why Compare Music? . . . . . . . . . . . . . . . . . . . . . . 15

1.1.2 Why Correct Music? . . . . . . . . . . . . . . . . . . . . . . . 16

1.2 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2 Background and Related Work 19

2.1 Computer Musical Terminology . . . . . . . . . . . . . . . . . . . . . 19

2.1.1 OMR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.1.2 MIDI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.2 Non-musical Work in Sequence Alignment . . . . . . . . . . . . . . . 20

2.2.1 An Overview of Sequence Alignment . . . . . . . . . . . . . . 20

2.2.1.1 Edit Distance . . . . . . . . . . . . . . . . . . . . . . 21

2.2.1.2 Alignment . . . . . . . . . . . . . . . . . . . . . . . . 23

2.3 Turning Sequence Alignment into a Musical Problem . . . . . . . . . 26

3 Previous Work 29

3.1 Related Work by Others . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.1.1 Typke: Music Retrieval based on Melodic Similarity . . . . . . 29

3.1.2 Church and Cuthbert: Rhythmic Comparison and Alignment . 30

3.1.3 IMSLP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.1.4 Viro: Peachnote, Musical Score Search and Analysis . . . . . . 30

3.1.5 White: Statistical Properties of Music and MIDI . . . . . . . . 30

7

Page 8: An Efficient Score Alignment Algorithm and its Applications

3.1.6 Rebelo et al: Optical Music Recognition - State-of-the-Art and

Open Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.1.7 OpenScore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.1.8 music21 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.2 Previous Work: A Modular Universal Hasher . . . . . . . . . . . . . . 32

3.2.1 Why Musical Comparison and Alignment Needs a Modular

Hashing Function . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.2.2 Previous Work on Modular Hashing Functions . . . . . . . . . 33

3.2.3 Overview of System . . . . . . . . . . . . . . . . . . . . . . . . 34

3.2.3.1 Pre-Hashing: High Level Hasher Settings . . . . . . . 35

3.2.3.2 Pre-Hashing: Low Level General Settings . . . . . . 36

3.2.3.3 Pre-Hashing: Low Level Note Settings . . . . . . . . 36

3.2.3.4 Pre-Hashing: Low Level Chord Hashing Settings . . 37

3.2.3.5 Hashing: hashStream . . . . . . . . . . . . . . . . . 37

3.2.3.6 Hashing: Setting up ValidTypes and StateVars . . 38

3.2.4 Hashing: Preprocessing the Stream . . . . . . . . . . . . . . . 38

3.2.4.1 Hashing: Creating a List of Elements to be Hashed . 38

3.2.4.2 Hashing: Building a Set of Hashing Functions . . . . 38

3.2.4.3 Hashing: Creating a NoteHash for Every Element . . 39

3.2.4.4 Hashing: Building finalHash . . . . . . . . . . . . . 39

3.3 Previous Work: Low-level Object Comparison Optimizations . . . . . 39

4 Implementation of OMRMIDICorrector 41

4.1 OMRMIDICorrector . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.1.1 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.1.1.1 processRunner: preprocessStreams . . . . . . . . . . 43

4.1.1.2 processRunner: setupHasher . . . . . . . . . . . . . . 45

4.1.1.3 processRunner: alignStreams . . . . . . . . . . . . . 45

4.1.1.4 processRunner: fixStreams . . . . . . . . . . . . . . . 45

4.2 Hasher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

8

Page 9: An Efficient Score Alignment Algorithm and its Applications

4.2.1 The Default Hasher . . . . . . . . . . . . . . . . . . . . . . . . 46

4.2.2 NoteHashWithReference . . . . . . . . . . . . . . . . . . . . . 46

4.3 Aligner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.3.1 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.3.2 Producing a Global Alignment . . . . . . . . . . . . . . . . . . 48

4.3.2.1 Pre-Alignment: Setting Appropriate Parameters . . . 48

4.3.2.2 Pre-Alignment: Hashing the Streams . . . . . . . . . 49

4.3.2.3 Alignment: Initializing the Distance Matrix . . . . . 49

4.3.2.4 Alignment: Populating the Distance Matrix . . . . . 50

4.3.2.5 Alignment: Backtrace to Find the Best Alignment and

Create the Changes List . . . . . . . . . . . . . . . . 52

4.3.2.6 Post-Alignment: Measures of Similarity . . . . . . . 53

4.3.2.7 Post-Alignment: Visual Display of Alignment . . . . 53

4.4 Fixer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.5 Tradeoffs and Design Decisions . . . . . . . . . . . . . . . . . . . . . 55

4.5.1 Modularity vs. Simplicity . . . . . . . . . . . . . . . . . . . . 55

4.5.1.1 A Variety of Settings and a Longer Setup . . . . . . 55

4.5.1.2 Length Agnosticism . . . . . . . . . . . . . . . . . . 56

4.5.2 Performance vs. Space and Object Overhead . . . . . . . . . . 56

4.5.2.1 NoteHashWithReference . . . . . . . . . . . . . . . . 56

4.5.2.2 Creating One Hasher and Many Aligners . . . . . . . 56

4.5.3 Manual User Input . . . . . . . . . . . . . . . . . . . . . . . . 57

5 Examples and Results 59

5.1 Eine Kleine Nachtmusik, K.525- I.Allegro, W. A. Mozart . . . . . . . 59

5.1.1 Musical Properties of K.525 . . . . . . . . . . . . . . . . . . . 60

5.1.1.1 Bass and Cello Doubling . . . . . . . . . . . . . . . . 61

5.1.1.2 Tremolos . . . . . . . . . . . . . . . . . . . . . . . . 61

5.1.2 Preparing the Raw Input . . . . . . . . . . . . . . . . . . . . . 61

5.1.3 Running OMRMIDICorrector on K.525 . . . . . . . . . . . . . 62

9

Page 10: An Efficient Score Alignment Algorithm and its Applications

5.1.3.1 A Closer Look at Specifics of processrunner . . . . 63

5.1.4 Example 1: Similarity Metrics in K.525 . . . . . . . . . . . . . 63

5.2 String Quartet No.7 in E-flat major, K.160- I.Allegro, W. A. Mozart . 64

5.2.1 Running EnharmonicFixer on K.160, Violin 1 . . . . . . . . . 64

5.2.2 Analysis of Correction . . . . . . . . . . . . . . . . . . . . . . 65

5.3 Results: Timing Tests . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5.3.1 Runtime Analysis: Hasher . . . . . . . . . . . . . . . . . . . . 66

5.3.2 Runtime Analysis: Aligner . . . . . . . . . . . . . . . . . . . 66

5.3.3 Runtime Analysis: Fixer . . . . . . . . . . . . . . . . . . . . . 67

5.3.4 Runtime Analysis: OMRMIDICorrector . . . . . . . . . . . . . 67

5.3.5 Empirical Results . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.3.6 %timeit: Hasher . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.3.7 %timeit: Aligner . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.3.8 %timeit: Fixer . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.3.9 %timeit: OMR/MIDI Corrector . . . . . . . . . . . . . . . . . 68

5.3.10 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.3.11 Implications of Free Music . . . . . . . . . . . . . . . . . . . . 69

A Alignment Visualization of K.525 71

B Alignment Visualization of K.525 75

C Alignment Visualization of Violin 1 part in K.160 85

10

Page 11: An Efficient Score Alignment Algorithm and its Applications

List of Figures

2-1 An example of protein alignment . . . . . . . . . . . . . . . . . . . . 21

2-2 Alignment of MUSICAL and JUSTICE . . . . . . . . . . . . . . . . . 22

2-3 A bad alignment of MUSICAL and JUSTICE . . . . . . . . . . . . . 24

2-4 The alignment of MUSICAL split at i and JUSTICE split at j . . . . 24

2-5 Initial distance matrix setup for aligning MUSICAL and JUSTICE . 25

3-1 Rhythmic hash of a short melody . . . . . . . . . . . . . . . . . . . . 33

3-2 Melody transposition hash . . . . . . . . . . . . . . . . . . . . . . . . 33

4-1 System diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4-2 An example string quartet stream and its four parts. . . . . . . . . . 44

4-3 A bad alignment setup . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4-4 A better alignment setup . . . . . . . . . . . . . . . . . . . . . . . . . 44

5-1 The first page of the scanned copy of the score found on IMSLP. . . . 60

A-1 Visual display of changes in Violin 1 of target (MIDI) stream. . . . . 72

A-2 Visual display of changes in Violin 1 of source (OMR) stream. . . . . 73

B-1 K.525- I.Allegro MIDI Violin 1 alignment . . . . . . . . . . . . . . . . 76

B-2 K.525- I.Allegro OMR Violin 1 alignment . . . . . . . . . . . . . . . . 77

B-3 K.525- I.Allegro MIDI Violin 2 alignment . . . . . . . . . . . . . . . . 78

B-4 K.525- I.Allegro OMR Violin 2 alignment . . . . . . . . . . . . . . . . 79

B-5 K.525- I.Allegro MIDI Viola alignment . . . . . . . . . . . . . . . . . 80

B-6 K.525- I.Allegro OMR Viola alignment . . . . . . . . . . . . . . . . . 81

11

Page 12: An Efficient Score Alignment Algorithm and its Applications

B-7 K.525- I.Allegro MIDI Cello alignment . . . . . . . . . . . . . . . . . 82

B-8 K.525- I.Allegro OMR Cello alignment . . . . . . . . . . . . . . . . . 83

C-1 Post-alignment visualization of the MIDI Violin 1 part in K.160. . . . 86

C-2 Post-alignment visualization of the OMR Violin 1 part in K.160. . . . 87

C-3 Post-enharmonic and pitch fixing visualization of the OMR Violin 1

part in K.160. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

12

Page 13: An Efficient Score Alignment Algorithm and its Applications

List of Tables

2.1 Naive Operation Cost Function Table . . . . . . . . . . . . . . . . . . 22

2.2 Operation Cost Table With Costlier Substitutions . . . . . . . . . . . 23

2.3 Operation Cost Table With Different Substitution Costs . . . . . . . 23

5.1 similarityScore and changesCount for every pair of aligned streams 64

13

Page 14: An Efficient Score Alignment Algorithm and its Applications

14

Page 15: An Efficient Score Alignment Algorithm and its Applications

Chapter 1

Introduction

1.1 Motivation

1.1.1 Why Compare Music?

Comparing music is seminal to the field of Music Information Retrieval (MIR). Ques-

tions such as “Is this piece of music plagiarized from Bach?”, “Is this rhythmically

more similar to Philip Glass or Scott Joplin? Melodically?”, “Is this newly discovered

musical transcription an entirely new folk song or just a variation of an old one?” can

be answered by comparing the pieces in question and applying a quantitative metric

to determine that, “Yes, this piece quotes Bach more than other pieces of similar time

period and style”, or, “This piece has 74% melodic similarity to Joplin’s repertoire,

10% similarity to Glass, but rhythmically shares 64% similarity with Glass and only

12% with Joplin.”

Computational music comparison, a subset of classical computer science string com-

parison, comes in many forms and each domain has slightly different specifications for

alignment and comparison.

15

Page 16: An Efficient Score Alignment Algorithm and its Applications

The domain of music comparison that I will be studying and building tools for will

be comparison between a MIDI score, which represents music in a restricted manner

and its OMR counterpart, generated from a scanned score. Each representation is

imperfect on its own.

MIDI represents music in a limited manner. This format encodes enharmonic notes

as the “same”; note lengths are relative to each other, not necessarily in relation to

a defined time signature; there is freedom in representing duration, as long as the

total length is correct. Many other design decisions that make MIDI very universally

compatible but also lacking in definition. Moreover, since MIDI scores are generally

created with human input, they tend to be correct, pitch-wise and rhythmically.

OMR takes a score that is likely to be correct (the idea being this is a printed and

published work, and also likely proofed for errors by the composer or publisher), scans

it and tries to recreate what it scans in a digital notation format. While the more

commonplace optical character recognition (OCR) has been optimized and whittled

to an extremely high standard, OMR is still at present only 90% correct [12]. The

space that OMR is in with is much larger and less well-defined than the 52 capital

and lowercase letters and punctuation that OCR has been optimized for. Therefore,

much of the optimizations for OCR do not directly apply to OMR. Some common

errors in OMR include incorrectly recognizing a sharp as a natural and vice versa,

mistaking rests as notes and vice versa, and other errors that any musician trained in

the basics of music theory could identify as “wrong”.

1.1.2 Why Correct Music?

The quality of OMR should be able to be greatly improved is rules such as: (1) all

measures must sum to the same duration, and (2) only notes with existing accidentals

can be made natural, were taken into account. Computer-readable scores are very

powerful because they can be analyzed, replicated, and distributed, and OMR is just

way of producing these scores.

16

Page 17: An Efficient Score Alignment Algorithm and its Applications

A big idea is that we can compare MIDI scores and OMR scores, properly align them

for as many points in the music as possible, and then use the correct data present in

the MIDI score to correct for scanning errors in the OMR score, while retaining the

richness of OMR’s musicXML format.

1.2 Chapter Summary

Chapter 2 will discuss background information that is important to understanding

my work, including basic musical terms, an overview of canonical sequence alignment

methods. as well as related work in musical sequence representation, alignment, and

correction. Chapter 3 discusses previous work. It will summarize related work in

musical sequence representation, alignment, and correction by others in the field. It

will also summarize the work I did previously that is used as a building block for my

thesis work and briefly discusses some ideas that were not as promising to develop

further. Chapter 4 will be a deep dive into the final implementations of the three

main modules of my work and provides a concrete example of how the entire system

would work. An example of correction taken from Mozart along with performance

and testing metrics are discussed in Chapter 5.

17

Page 18: An Efficient Score Alignment Algorithm and its Applications

18

Page 19: An Efficient Score Alignment Algorithm and its Applications

Chapter 2

Background and Related Work

2.1 Computer Musical Terminology

2.1.1 OMR

OMR, or Optical Music Recognition, is method of recognizing the characters on

scanned sheet music or printed scores and interpreting them into a digital form. OMR

is great for large-scale fast digitization of scores. However, the current state-of-the-art

of OMR leaves much room for improvement [12].

2.1.2 MIDI

MIDI is the protocol with which electronic instruments and computers can commu-

nicate with each other. Many people associate MIDI with an idea of low-quality

computer music, but it is just an instruction-like protocol, not actually an audio

recording. At the basic level, MIDI messages encode notes with note-on, note-off

times, and a numbers that corresponds to their frequencies. A piece of music can

be encoded in MIDI as a sequence of notes. On the other hand, MIDI files contain

19

Page 20: An Efficient Score Alignment Algorithm and its Applications

sequences of MIDI protocol events time-stamped for playback in a certain order.

2.2 Non-musical Work in Sequence Alignment

It is far easier to understand sequence alignment in a non-musical context (here the

context is plain string alignment), understand the rules and assumptions made in

the non-musical context, generalize them, and then recreate music-specific rules and

assumptions.

Readers familiar with the canonical representation of the string alignment problem as

a dynamic programming matrix can skip this section.

2.2.1 An Overview of Sequence Alignment

Sequence alignment in bioinformatics is an application of well-studied problem. In this

field, researchers want to identify similar regions of DNA, RNA, or protein sequences to

study functional, structural, or evolutionary relationships between the two sequences.

There are various computational algorithms that can be used to solve sequence

alignment including slower but more accurate dynamic programming algorithms and

faster but less accurate probabilistic or heuristic-based algorithms. In my work, it

makes more sense to use the slower but more accurate methods, as our datasets are

not so large and small mistakes that might be permissible to overlook in larger data

could be much more egregious in smaller contexts.

The specific algorithm chosen to implement for solving the musical sequence alignment

problem is a version of the Needleman-Wunsch dynamic programming algorithm for

global sequence alignment [9]. 1

1This algorithm has a history of multiple invention. Needleman-Wunsch was first published in the

context of protein alignment in bioinformatics. People with a more theoretical computer science bent

might know the same algorithm by a different name, Wagner-Fisher.

20

Page 21: An Efficient Score Alignment Algorithm and its Applications

Figure 2-1: An example of protein alignment[11]

At a high level, the Needleman-Wunsch algorithm and its family of dynamic pro-

gramming sequence alignment algorithms calculate the edit distance between two

sequences by producing a least-cost alignment of the sequences. An alignment is an

assignment of pairwise characters and edit operations. Overall cost of an alignment

(i.e. its edit distance) is determined by summing the individual costs of insertion,

deletion, substitution, or no-change operations between pairwise characters in order

to morph one sequence into the other.

The next sections discuss coming up with edit distance calculations and then the

method that Needleman-Wunsch uses to find the best alignment.

2.2.1.1 Edit Distance

The term edit distance is a way of quantifying how dissimilar two sequences are from

each other. If sequences are represented as an array of characters, then sequences can

transform into each other through a series of insertion, deletion, substitution, and

no-change operations, with each operation “costing” a certain amount. Edit distance

21

Page 22: An Efficient Score Alignment Algorithm and its Applications

is the smallest possible total cost associated with the transformation that is a series

of operations that turns one sequence into another. As an example, consider the

calculation of the edit distance between the two strings MUSICAL and JUSTICE

that uses this series of operations for transforming MUSICAL into JUSTICE:

1. M ! J (substitution)

2. ✏ ! T (insertion)

3. A ! E (substitution)

4. L ! ✏ (deletion)

Note that implicitly excluded are all the no-change operations that look like this:

1. U ! U (no-change)

Operation CostInsertion 1Deletion 1Substitution 1No-change 0

Table 2.1: Naive Operation Cost Function Table

Figure 2-2 corresponds to an alignment that looks like figure 2-2. The green pair of

characters indicate an insertion operation, red is deletion, and purple is substitution.

These colors are also used in the visualization of OMR/MIDI stream alignment.

seq1 M U S _ I C A Lseq2 J U S T I C E _

Figure 2-2: The two strings have an edit distance of 4, because it requires four operationsto completely morph one string into the other (discounting no-change operations).

One cost refinement that could be made is to slightly increase the cost of substitution.

Since a substitution can be thought of as a consecutive insertion and deletion operation,

but more compact, it should have a cost greater than that of a single insertion or

deletion, but less than twice the cost of a single insertion/deletion.

22

Page 23: An Efficient Score Alignment Algorithm and its Applications

A slightly different cost function that assigns more cost to substitution changes could

look like table 2.2. Using this cost function, the edit distance of changing MUSICAL

into JUSTICE is 5.

Operation CostInsertion 1Deletion 1Substitution 1.5No-change 0

Table 2.2: Operation Cost Table With Costlier Substitutions

Yet another cost refinement is to give different subtypes of a single operation different

weights. Consider if the application of trying to run a spell checker (one application

of string alignment and correction algorithms) on a word that isn’t recognized by the

dictionary. In the particular spell check model, perhaps substitution of a vowel for a

vowel isn’t as costly as any other types of substitution, so vowel for vowel should only

have a cost of 1.2. Using this new cost function, the new edit distance is 4.7.

Operation CostInsertion 1Deletion 1Substitution - vowel for vowel 1.2Substitution - all others 1.5No-change 0

Table 2.3: Operation Cost Table With Different Substitution Costs

Considerations such as these go into the creation of a musical sequence aligner as well.

2.2.1.2 Alignment

Recall that the edit distance between two sequences is defined as the least-cost

alignment. Figure 2-3 shows a valid alignment, but it is certainly not the least-cost

alignment, and therefore does not correspond to the edit distance between between

the two sequences, as it requires four substitutions, one insertion, and one deletion.

23

Page 24: An Efficient Score Alignment Algorithm and its Applications

seq1 _ M U S I C A Lseq2 J U S T I C E _

Figure 2-3: A bad alignment of MUSICAL and JUSTICE

If there are two sequences, of length n and m, then they have�n+mm

�possible alignments.

Put into context, consider two melodies of 10 notes each. Naively aligning just the

note pitches, there are 184,765 different possible alignments! Clearly this isn’t scalable,

which is why this piece of insight is important: A globally optimal alignment contains

locally optimal alignments! This means that the computations of subproblems can be

reused to solve larger problems. What follows from this piece of insight is that for

two strings split at (i, j), the best alignment is:

best alignment of string1[: i] and string2[: j] + best alignment of string1[i :] and

string2[j :]

i#

seq1 M U S I C A Lseq2 J U S T I C E

"j

Figure 2-4: The alignment of MUSICAL split at i and JUSTICE split at j

If all the possible alignment subproblems are kept track of, it is possible to recursively

build an optimal solution using the answers from the subproblems. One way to

represent this is with a scoring matrix indexed by i and j. For the problem of aligning

MUSICAL and JUSTICE, the setup would look something like in figure 2-5

Recall that the problem has been defined in three different ways:

1. Aligning MUSICAL and JUSTICE

2. Finding the edit distance between MUSICAL and JUSTICE

3. Finding a series of operations that transforms MUSICAL into JUSTICE

24

Page 25: An Efficient Score Alignment Algorithm and its Applications

- J U S T I C E-MUSICAL

Figure 2-5: Initial distance matrix setup for aligning MUSICAL and JUSTICE

All of these problem definitions are (almost) one and the same. Calculating the edit

distance will also provide a good alignment of the two words and a good alignment

necessitates a series of character change operations. It is important to know that (3)

has an implicit directionality built into the problem statement. It won’t always be the

case that alignments and transformations are symmetric (i.e. cost the same and have

the same series of operations), but having a simple cost operation functions, such as

in tables 2.1 and 2.2, will make such problems symmetric.

In this particular sample problem, it is always the case that the word MUSICAL is

transformed into another word, JUSTICE. In this particular context, OMR sequences

are transformed into MIDI sequences.

After the scoring matrix is set up, it is populated with initial values:

1. 0 in (0, 0)

2. i� 1 · insertioncost for i in (i, 0) (this is the first column)

3. j � 1 · deletioncost for j in (0, j) (this is the first column)

In reading a scoring matrix setup like this, insertions are vertical movements, deletions

are horizontal movements, and substitutions are diagonal. After setting up the initial

matrix, all of the remaining slots are filled in using this update rule:

25

Page 26: An Efficient Score Alignment Algorithm and its Applications

D[i][j] =min {

D[i-1][j] + insCost (the cell above),

D[i][j-1] + delCost (the cell to the left),

D[i-1][j-1] + subCost (the cell diagonal, up and to the left)

}

2.3 Turning Sequence Alignment into a Musical

Problem

Finding the edit distance of a musical sequence as opposed to a string of ASCII

characters is a little trickier. First, there isn’t an intuitive “space” of music elements

the way that there is a “space” of all characters. Second, even after identifying the

musical elements, there isn’t an intuitive metric of distance between elements. For

example, how would the distance between an A# quarter note in the 4th octave and

a B quarter note in the 4th octave compare with an A# eighth note in the 4th octave

and an A# quarter note in the 4th octave?

By massaging musical sequences into string-like objects, it becomes easier to see how

techniques in string alignment can be applied to musical sequence alignment. The

questions and answers posed below help identify parallels between strings and musical

sequences.

• What are sequences are and what makes them up?: musical sequences can be

thought of as an ordering of notes, rests, and chords.

• What is the space that sequences occupy?: This question can roughly be

approximated as, what are the important properties of notes, rests, chords?

Some immediate answers are pitch, duration, offset.

26

Page 27: An Efficient Score Alignment Algorithm and its Applications

• What is an appropriate metric for distance?: In the previous section there were

multiple different substitution functions. What kind of and how many ways are

there to calculate distance between musical elements? Perhaps representing a

musical element as a tuple of numbers, (pitch, duration, offset) gives us points

in space that we can calculate distance.

Once musical sequences can be represented in string-like formats, we can perform

string operations on them. The next chapter discusses related work by others who

have explored some part of this musical sequence alignment question as well as my

own previous work that relates to this thesis.

27

Page 28: An Efficient Score Alignment Algorithm and its Applications

28

Page 29: An Efficient Score Alignment Algorithm and its Applications

Chapter 3

Previous Work

This chapter discusses previous work in musical sequence representation, alignment,

and correction. The first half of the chapter is about work by others. The second half

is my own previous work that was relevant to this thesis.

3.1 Related Work by Others

3.1.1 Typke: Music Retrieval based on Melodic Similarity

Typke’s thesis on Music Retrieval based on Melodic Similarity gives an example of

how music can be transformed and compressed into its most relevant features [6].

Here, Typke proposes encoding music into what he calls a “weighted point set" in

a two-dimensional space of time and pitch before measuring any sort of melodic

similarity between music. A note in a piece is represented as point set of pitch (in the

Hewlett 40 system [4]) and an offset in quarter notes from the beginning of the piece.

This representation allows for notes to be easily compared with each other. Since

the features of notes are represented as numbers, it is convenient to do numerical

operations with the representations.

29

Page 30: An Efficient Score Alignment Algorithm and its Applications

3.1.2 Church and Cuthbert: Rhythmic Comparison and

Alignment

Church and Cuthbert [2] describe a system that hashes scores purely on rhythm and

uses heuristics of rhythm in music (e.g. a repeated rhythm should be the same every

time it is expected to repeat) to correct OMR scores. This model was the original

inspiration for building an alignment system with more heuristics pertaining to the

qualities of music beyond just rhythm to further improve OMR accuracy.

3.1.3 IMSLP

IMSLP holds a large collection of musical scores (currently over 350,000), many of

which are user contributed (i.e. not necessarily verified correct) [5]. However, its

contents are free and in the public domain. My work makes use of this large dataset.

3.1.4 Viro: Peachnote, Musical Score Search and Analysis

Peachnote [14] is a project that contains a large bodyof musical scores. Peachnote

performs OMR over the PDF scores in IMSLP and indexes the data for efficient

querying and searching.

3.1.5 White: Statistical Properties of Music and MIDI

White’s dissertation [15] describes properties of music that can be turned into heuristics

for refining algorithmic analysis of music. Specifically, he builds upon Temperley’s

1997 [13] paper’s details of harmonic analysis of MIDI and refers to other research that

has been constructive towards Temperley’s model. White does say that Temperley’s

algorithm fails when faced with ambiguity, such as in MIDI differentiating between

enharmonic tones. The hope is that by combining the ambiguous parts of MIDI with

30

Page 31: An Efficient Score Alignment Algorithm and its Applications

the probabilistic and rigorous details of OMR scores, ambiguities in MIDI tonality can

be resolved. In addition, White also cites a large repository of high quality MIDI files.

3.1.6 Rebelo et al: Optical Music Recognition -

State-of-the-Art and Open Issues

Rebelo et al’s paper discusses the state of the art of OMR at each step of the OMR

process and proposes future work in certain areas that would advance the current

state of OMR [12].

3.1.7 OpenScore

OpenScore is a project that aims to liberate all public domain music using both

human and computer-powered techniques. Many works in the public domain are

of low quality and/or are in formats that do not lend well to analysis, sharing, or

reproducing, and OpenScore plans to turn them into high-quality music in digital

formats. Both humans and computer software can aid in any part of the process, from

transcription to error correction. The work of my thesis can directly contribute to the

computer software tools that OpenSource is looking for.

3.1.8 music21

music21 is a open source Python library that is a toolkit for computer-aided musicology.

It provides both a framework for representation of music as well as tools to perform

analyses on music. The rhythm correctors built by Church and Cuthbert were made

using the tools that music21 provides. The work of my thesis relies heavily on the

infrastructure ofmusic21, its score and stream representations and their operations.

31

Page 32: An Efficient Score Alignment Algorithm and its Applications

3.2 Previous Work: A Modular Universal Hasher

The work of my thesis uses the universal Hasher for digital representations of music

tool. This work was largely done in the Spring of 2015 as part of my undergraduate

capstone supervised independent study (6.UAP).

Prior to my work on building a musical sequence aligner and fixer, I built a hasher with

the specific intention that it would be able to adapt according to different specification

settings decided by the programmer. The programmer decides which kinds of music21

stream elements, such as notes and chords, and properties, such as pitch name or

duration, are relevant to their hash, and selects them to build a specific instance of a

Hasher object. This Hasher instance can then be used to produce a hash from any

music21 stream.

3.2.1 Why Musical Comparison and Alignment Needs a

Modular Hashing Function

A hash function maps pieces of data of arbitrary size into data of fixed size and

standard representation. If musical sequences could be discretized into elements and

if these elements could be hashed, then the processes of comparison and alignment

could be made deterministic and quick.

Depending upon the problem statement that calls for a hash in the first place, this is

a two step process:

1. discretizing a musical sequence into a list of relevant elements

2. hashing those elements

The details of this process can vary. For instance, in determining the rhythmic

similarity between two musical sequences, only the musical elements with durations

32

Page 33: An Efficient Score Alignment Algorithm and its Applications

(e.g. notes, chords, rests) in each sequence would be relevant. In hashing these

elements, only properties such as duration and offset might be relevant.

!" # ! $ %Figure 3-1: The hash of rhythmic features of this short melody would be NoteHash(offset=0,duration=0.5), NoteHash(offset=0.5, duration=0.5), NoteHash(offset=1.0, duration=2.0)NoteHash(offset=4, duration=1.0)

As another example, in determining whether one piece is an approximate transposition

of another, only the elements with some relation pitch (e.g. notes and chords) might

be relevant.

!" # ! $ %Figure 3-2: The hash of features related to transposition of this shortmelody would be NoteHash(intervalFromLastNote=0), NoteHash(intervalFromLastNote=-2),NoteHash(intervalFromLastNote=-5)

Once a musical sequence is discretized and represented as a sequence of NoteHash

objects, each individual tuple in the hash sequence can be thought of as a point in

the space of all combinations of possible element properties. Then, canonical edit

distance or string alignment algorithms can then be used to come up with a measure

of similarity or an alignment between the two original musical sequences.

3.2.2 Previous Work on Modular Hashing Functions

The idea of extracting qualities of music specific to the needs of distance analyses and

encoding that information in a lower-dimensional representation is not new. Typke in

his thesis proposes what he calls a weighted point set representation of music [6]. His

method of creating a weighted point set for a stream of music involves only taking

melodic information (i.e. notes) and representing each discrete note as a three-tuple

of (time, pitch, weight). The time component encodes information about the note’s

33

Page 34: An Efficient Score Alignment Algorithm and its Applications

offset from the beginning. The pitch component encodes the note’s pitch using the

Hewlett’s base-40 system (this is a numerical encoding similar to MIDI pitch but

is also able to encode more information than just absolute frequency). The weight

component is an assignment of how important the particular note is to the melody of

the music stream." as something like "The weight component is an assignment of a

numerical value to each note in the music stream, indicating the relative importance

of the different notes within the melody [4].

3.2.3 Overview of System

The Hasher object is found in hasher.py. Its purpose is to identify the relevant

features of relevant elements in any music stream and to represent them in a hashed

format that is easy to compute with. Relevant elements here means objects such

as notes, rests, chords. Relevant features here means qualities of these elements

such as pitch, duration, offset. I represent these qualities in the hash as lightweight

integers, floats, and strings, which makes future computations such as comparisons,

and numerical operations easy and fast.

Upon instantiation, Hasher does not take in any parameters. After an instance of

Hasher is created, however, the user can set specific parameters in the Hasher to fit

their own needs. The user must specify which relevant stream elements (notes, rests,

chords) should be hashed, as well as which relevant qualities of those elements should

be hashed (pitch, duration, offset, etc.).

After the hashing parameters are set, the user calls the hashStream method on a

stream. This method strips the stream only to the relevant elements, creates a set

of hash functions based on the user’s selection of relevant element qualities, and

individually hashes each element with the entire set of specified hashing functions.

The Hasher will create either a NoteHash or a NoteHashWithReference object for

each element and add it to the end of the FinalHash list. Both are tuples that

34

Page 35: An Efficient Score Alignment Algorithm and its Applications

hold the hashed qualities of a single stream element. The only difference is that a

NoteHashWithReference stores a reference back to the original stream element. Even

though this has a lot of overhead, it is necessary if ever the original streams need to

referenced for context or if they need to be changed.

The output of the entire process is a list of NoteHash or NoteHashWithReference

objects, stored in finalHash.

3.2.3.1 Pre-Hashing: High Level Hasher Settings

There are several system-wide high level settings that must first be set before the

hashStream method can be called.

• includeReference - By default this is set to False, but if set to True, each

element hash will be stored in a NoteHashWithReference object that includes

a reference to the original element in its original stream. For many purposes,

having this reference is not necessary and takes up extra overhead, so it is set to

False by default.

• validTypes - This is a list of the types of stream elements should be hashed. By

default this is [note.Note, note.Rest, chord.Chord] and can be changed to

be any subset of those three stream elements.

• stateVars - This is a dictionary that keeps track of state between consecutive

hashes. By default it doesn’t contain anything, but can be programmed to

contain intervalFromLastNote and KeySignature during the later process of

preprocessing the streams and setting the appropriate hashing functions if the

user-set parameters call for their use.

• hashingFunctions - This is the dictionary that stores maps which hashing

functions should be used to hash which qualities. For example, if the user chose

to hash notes and their MIDI pitch values, then the dictionary value for the key

35

Page 36: An Efficient Score Alignment Algorithm and its Applications

Pitch would be _hashMIDIPitchName, the name of the function that takes a

note and returns its MIDI pitch.

3.2.3.2 Pre-Hashing: Low Level General Settings

These settings affect more than one of notes, rests, chords.

• _hashOffset - If True, this will include in the hash a number that is associated

with the offset of a musical element from the beginning of the stream.

• _hashDuration - If True, this will include in the hash a number that is associated

with its duration (measured in a multiplier of duration.quarterLength) .

• _roundDurationAndOffset - This is a boolean for determining whether offsets

and durations should be rounded. This is useful for files (e.g. MIDI recordings

without rounding) where beginnings of notes, rests, and chords may not fall

exactly on the beat (e.g. beat 2.985 instead of 3).

• granularity - A number that determines how precisely we round. If duration

and offset are rounded, they will be rounded to this granularity of beat division.

For example, a granularity of 32 will round notes to the nearest 32nd beat division

and no further.

3.2.3.3 Pre-Hashing: Low Level Note Settings

• _hashPitch and _hashMIDI - These settings determine whether the pitch of a

note will be hashed, and if it is, how it will be represented. If _hashMIDI is

True, then the MIDI pitch value is hashed. Otherwise, its name representation

is hashed (e.g.“C##”). This setting is useful depending upon whether the user

would want to differentiate between B# and C.

• _hashOctave - If True, this includes in the hash the number octave that a note

36

Page 37: An Efficient Score Alignment Algorithm and its Applications

is in. This could be useful for an envelope-like following of pitches.

• _hashIntervalFromLastNote - If True, this returns a number corresponding

to the interval between the current note and the last note when applicable.

This setting is a useful alternative to hashing pitch when trying to identify

transpositions of pieces.

• _hashIsAccidental - If True, this will return a boolean indicating whether

the pitch of the hashed element is outside of the key signature. This setting is

currently not fully supported.

3.2.3.4 Pre-Hashing: Low Level Chord Hashing Settings

Chords can be thought of as multiple notes happening at the same time. In music21,

chords are their own type of object that can be decomposed into note objects. In the

hashing system, chords can be hashed as chords, or by their constituent notes.

• _hashPrimeFormString - If True, this includes in the hash of chord a string

representation of its prime form.

• _hashChordNormalOrderString - If True, this includes in the hash of a chord

a string representation of its normal order.

3.2.3.5 Hashing: hashStream

Once the settings of a Hasher instance have been set, we can call its hashStream

method on any stream. The following sections describe the what happens within a

single hashStreams call.

37

Page 38: An Efficient Score Alignment Algorithm and its Applications

3.2.3.6 Hashing: Setting up ValidTypes and StateVars

First, the method setupValidTypesAndStateVars is called to set up the state vari-

ables and the valid hashing types. Recall that by default stateVars is empty and

validTypes is [note.Note, note.Rest, chord.Chord] (of course, the user has al-

ready had the option of changing validTypes to contain any subset of the default

types). This method checks whether the user has set either of these two settings:

1. hashIntervalFromLastNote - if this is set, then IntervalFromLastNote is

added to stateVars to keep track of it.

2. hashIsAccidental - if this is set, then KeySignature is added to stateVars

to keep track of, and key.KeySignature is added to the list of validTypes to

hash.

3.2.4 Hashing: Preprocessing the Stream

The stream that the hashStream method acts upon is stripped of all note ties in

the preprocessStream if the stripTies setting is True. Additionally, this method

returns the stream in a generator form by calling recurse on the stream.

3.2.4.1 Hashing: Creating a List of Elements to be Hashed

Next the hashStream method builds a list of elements that are to be hashed by

filtering the recursed stream of any types that are not in validTypes. This is stored

in finalEltsToBeHashed.

3.2.4.2 Hashing: Building a Set of Hashing Functions

The method setupTupleList is called, and it sets up tupleList, hashingFunctions

and tupleClass, all related to each other. tupleList is a list of all the element

38

Page 39: An Efficient Score Alignment Algorithm and its Applications

properties that should be hashed. hashingFunctions is a dictionary of which hashing

function should be used for each property. tupleClass is a namedtuple that is

constructed ad hoc based on which properties are to be hashed. The ad hoc construction

accommodates for the fact that different Hasher instances will contain different things

to hash.

3.2.4.3 Hashing: Creating a NoteHash for Every Element

For each of the elements in finalEltsToBeHashed, the Hasher hashes its relevant

properties using the hashing function listed in hashingFunctions. It then creates a

single NoteHash that stores all the hashed properties.

3.2.4.4 Hashing: Building finalHash

If includeReference is set to False, then each new NoteHash object is directly added

to finalHash. If includeReference is True, then each NoteHash gets replaced with

a NoteHashWithReference object that includes a reference to the original stream

element, and that gets added to finalHash instead.

3.3 Previous Work: Low-level Object Comparison

Optimizations

Work during the fall of 2015 focused on optimizing the comparison process with

variable-length data. The underlying idea was that Python objects tend to have more

overhead than C objects, and if comparison and alignment algorithms were going to

be doing a lot of comparisons between NoteHash objects produced by the Hasher,

then converting these Python objects into C objects of appropriate size could save a

lot of time. Here, “appropriate size” refers to the idea that NoteHash objects with

39

Page 40: An Efficient Score Alignment Algorithm and its Applications

only a few hashed properties would correspond to smaller C objects, and NoteHash

objects with more hashed properties would correspond to larger C objects.

A smart implementation of the most efficient method, Yeti Levenshtein, already exists

in C, so my work was to cleverly convert variable-length NoteHash Python objects

to C objects to use the Yeti Levenshtein library, which I hoped would be faster than

using any Python edit distance algorithm.

I observed that the speedup we gained in converting variable-length NoteHash Python

objects to C objects to use this library was a negligible cost compared to the run

time of the comparison algorithm. This empirical evidence suggested that massive

parallelization and speedup of the comparison process is difficult and that optimizations

in alignment would be the key to significant speedup.

Thus, I concluded that musical comparison was already near the boundaries of its

optimization and my work turned to focus on creating a robust alignment algorithm,

which is described in the next chapter.

40

Page 41: An Efficient Score Alignment Algorithm and its Applications

Chapter 4

Implementation of OMRMIDICorrector

This chapter describes the final implementation of the entire OMR/MIDI musical

stream alignment and correction system. First there will be an overview of the entire

system. Then, there will be discussion on the final implementation of each of the large

modules that is divorced from the context of the OMRMIDICorrector class. Afterwards,

I will present an example use case of the system. I will conclude with a discussion of

design tradeoffs and choices that I made in implementing the system.

The final implementation of the OMR/MIDI musical stream alignment and correction

system. can be found in omrMidiCorrector.py. The OMR system is a combination

of one or more of each of the Hasher, Aligner, and Fixer modules with specific

parameters set.

The inputs to the system are two music21 streams (one OMR, one MIDI). The output

of the system is a corrected OMR stream.

41

Page 42: An Efficient Score Alignment Algorithm and its Applications

4.1 OMRMIDICorrector

4.1.1 System Overview

The system is the OMRMIDICorrector class in omrMidiCorrector.py. This class is

initialized with a midiStream, an omrStream and an optional Hasher. The Hasher

parameter is optional because the system provides a default Hasher with some all-

purpose settings if no Hasher is passed in. The method processRunner is the main

method of this class that preprocesses the streams, sets up the appropriate Hasher,

and calls the aligning and fixing methods.

Figure 4-1: The inputs to the system are OMR and MIDI streams. The output of thesystem is a corrected OMR stream.

42

Page 43: An Efficient Score Alignment Algorithm and its Applications

4.1.1.1 processRunner: preprocessStreams

The processRunner first verifies the streams that are passed in and preprocess them

before being hashed, aligned, and fixed in the method preprocessStreams.

preprocessStreams begins by discretizing the parts of the stream if the user has set

discretizeParts to be True. Discretizing parts means to split the input streams

into their constituent parts, so that the input streams are aligned part-wise. Since

music is often comprised of multiple parts, this preprocessing step allows for input

streams that contain nested streams. Take for example, an OMR/MIDI alignment of

a string quartet. Both the OMR and MIDI inputs are stream in and of themselves,

and so are all individual instrumental parts within both of the streams.

A good alignment of the piece would involve aligning corresponding instrumental parts

with each other to ensure that the beginnings and ends of corresponding parts aligned

with each other. However, if we treat the inputs both as one long stream composed of

four serial streams instead of four parallel streams, only the beginnings of the first

instrumental parts and the ends of the last instrumental parts would be guaranteed to

be aligned with each other. This guarantee comes from the global alignment method

based on the Needleman-Wunsch algorithm that was implemented. Figures 4-2, 4-3,

and 4-3 illustrate the phenomenon.

By default, discretizeParts is set to True. However, the user ultimately has the

choice to align input streams part-wise or After a stream is split up into its constituent

parts, the parts are stored in the lists midiParts and omrParts. The aligners and

fixers work on all of the elements in the list, and so the entire system is agnostic to

the number of parts; a stream comprised of a single part is processed in the same way

as a multi-instrumental score.

Then, preprocessStreams checks if the number of parts in MIDI and the OMR stream

are the same. If they are off by more than one, then a OMRMIDICorrectorException

is raised and the process halts. This is a cursory check to see if the two input streams

43

Page 44: An Efficient Score Alignment Algorithm and its Applications

Figure 4-2: An example string quartet stream and its four parts.

Figure 4-3: A bad alignment setup of the quartet would treat the entire piece as four serialstreams and only the only guaranteed alignment would be between the beginning of the firstpart and the end of the last part.

Figure 4-4: A better alignment setup of the quartet would do part-wise alignment insteadof treating the entire piece as a sequence of parts. This methodology guarantees that thebeginnings and ends of each part are aligned.

can even reasonably be aligned.

If the numbers of parts are off by exactly one, then preprocessStreams spins up

a lightweight StreamAligner object to try and see if there is an implicit bassline,

where the Bass part doubles the Cello part but an octave lower. This is a common

44

Page 45: An Efficient Score Alignment Algorithm and its Applications

feature of orchestral scores. In such a scenario, the OMR score would have one fewer

part than the MIDI score because a bassist would read the cello line while reading

sheet music, but the MIDI score would explicitly list all the instrument lines. If the

similarity of the two streams is above a certain threshold, it is reasonable to assume

that the Bass part doubles the Cello part and the extra MIDI part is disregarded

in the alignment process. The checkBassDoublesCello method and its use of an

StreamAligner object to determine whether two streams are an octave apart is a

incidental example of a different use case of the StreamAligner.

4.1.1.2 processRunner: setupHasher

The next step in the processRunner method is setting up the Hasher object that will

be used to hash both input streams. This is done in the setupHasher method. If user

has passed in a Hasher object upon instantiation of an OMRMIDICorrector object,

then that is used. Otherwise this method sets up a default Hasher with settings

found to be generally effective in the setDefaultHasher method. The default Hasher

hashes only pitch and duration and includes references to the original stream elements.

4.1.1.3 processRunner: alignStreams

This method creates a StreamAligner object for each pairwise MIDI part and OMR

part found in midiParts and omrParts and aligns the two parts, calculates their

similarity metrics, and visually shows the changes and alignment if debugShow is True.

There is more detail about how the StreamAligner object works in section 4.3.

4.1.1.4 processRunner: fixStreams

The last step is of the entire process is to fix the OMR stream. Because the Fixer

still remains in development, the description of this method and the Fixer in section

4.4 will remain at a high level.

45

Page 46: An Efficient Score Alignment Algorithm and its Applications

In its final implementation, fixStreams should create instances of the OMRMidiFixer

object, one per pair of aligned streams and their changes list. This is the base class

that all other Fixer objects inherit from. The user would be able to choose which

Fixer objects they want to create and use on all the pairs of streams.

Currently, in order to use the two available child Fixer objects, a user can manually

create OMRMidiFixer and DeleteFixer and/or EnharmonicFixer objects using the

changes lists that processRunner creates.

4.2 Hasher

The Hasher object used in the final implementation is a specific instance of the Hasher

object described in section 3.2. The user can choose to pass in their own instance of

Hasher or use the default Hasher provided.

4.2.1 The Default Hasher

The default Hasher, as described in section 4.1.1.2, hashes only pitch and duration

of stream elements and set includeReference to be True. This setting is general

enough to provide for a reasonable alignment for most generic alignment problems.

But for more niche problems (for example, a rhythmic alignment), the user ought to

define their own Hasher and the default Hasher should not be used.

4.2.2 NoteHashWithReference

One update made to the original Hasher was the addition of a new type of NamedTuple

that includes a reference to the original element that is hashed and stored within

the tuple. This new type provided access to the original stream elements which was

necessary for three reasons:

46

Page 47: An Efficient Score Alignment Algorithm and its Applications

1. It allows the user to visually see the alignment using the showChanges method

in the Aligner. A list of the changes between two streams can be hard to parse

and see patterns within, so this method provides a more visually appealing way

of interpreting the list of changes.

2. It allows the user to access other elements in the context of the original stream

element. For example, it might be helpful to be able to access the most recent

key signature change in the stream for some fixing purpose.

3. In order for the Fixer to fix the OMR stream, it must be able to change the

original elements. The reference to the original stream element allows for this.

4.3 Aligner

4.3.1 System Overview

aligner.py contains the StreamAligner class that is the engine of the alignment

step. StreamAligner accepts as input two music21 streams, one specified as a source

stream, one as a target stream. The main functionality of StreamAligner is that

it produces a global alignment between the source and the target streams in the

form of a list of Change Tuples that maps each element in the source stream to

an element in the target stream via one of the four change operations (insertion,

deletion, substitution, no change). Additionally, StreamAligner also outputs a basic

measurement of similarity between the two stream inputs similarityScore. Lastly

StreamAligner has a show function that visually displays the alignment between the

source and target streams.

47

Page 48: An Efficient Score Alignment Algorithm and its Applications

4.3.2 Producing a Global Alignment

In order to produce a global alignment of two streams, the Aligner must hash

the two streams with an instance of a Hasher. After the two streams are hashed,

StreamAligner sets up a distance matrix, a la the method described in Chapter 2 for

classic string alignment, populates the matrix, and then performs a backtrace of the

matrix starting the lower right corner to produce the alignment.

4.3.2.1 Pre-Alignment: Setting Appropriate Parameters

In the interest of being a modular system, I chose to give the programmer the option

of setting their own Hasher. Upon instantiation of a StreamAligner, the programmer

can choose to pass in an instance of a Hasher. If no Hasher is set initially, then

StreamAligner uses the default Hasher that is set to hash Notes, Rests, and Chords,

and their MIDI pitches and durations. The default Hasher is generic enough to be

able to work with general alignment, but a more niche problem would require the

programmer to use a Hasher more specific to their problem

I also chose to leave the cost of Insertion, Deletion, and Substitution easily substitutable

as well. By default, the cost of both Insertion and Deletion is equal to the length of

one of the NoteHashWithReference tuples, which is exactly the number of properties

that are hashed of any musical element. (So in the case of the default Hasher,

Insertion and Deletion both have cost 2.) The cost of Substitution between two

NoteHashWithReference tuples is equal to the number of properties they don’t have

in common with each other. For example, the cost of substituting

NoteHashWithReference(pitch=59, offset=1.0, duration=2.0)

with either

NoteHashWithReference(pitch=59, offset=1.0, duration=3.0)

48

Page 49: An Efficient Score Alignment Algorithm and its Applications

or

NoteHashWithReference(pitch=60, offset=1.0, duration=2.0)

would have a cost of 1, since each differs in one property. However, substituting

NoteHashWithReference(pitch=60, offset=2.0, duration=3.0)

would have a cost of 3 since all three properties are different.

4.3.2.2 Pre-Alignment: Hashing the Streams

The alignment algorithm can only be applied after the streams are represented in

a hashed format as a list of NoteHashWithReference tuples. Both input streams

will be hashed and stored as the hashedTargetStream and the hashedSourceStream.

The hashed format is the analog to a string and each NoteHashWithReference tuple

is the analog to a character in classic string alignment.

It recommended that the hasher used to the hash the streams be set to include

references to the original stream by using NoteHashWithReference tuples in the

creation of the hash (as opposed to the NoteHash tuple that has much less overhead),

and in the default hasher, This ensures that the show function and future fixers have

access to the original objects that each hash came from. This makes it easier to color

the original objects in the case of the show function and to extract any necessary

metadata that wasn’t encoded in the hash for future fixers.

There is also an option to indicate that the streams passed in are already hashed, but

in the context of my thesis work, this should never be the case.

4.3.2.3 Alignment: Initializing the Distance Matrix

The first step in the alignment process is initializing the distance matrix. We set up an

empty n+1⇥m+1 matrix, where n is the length of the hashed target stream and the

49

Page 50: An Efficient Score Alignment Algorithm and its Applications

m is the length of the hashed source. The extra column and extra row are populated

with initial costs that correspond to just Insertion or just Deletion operations. In the

leftmost column, the value i ⇥ insertCost is put into entries (i, 0). In the topmost

row, the value j ⇥ deleteCost is put into entries (0, j).

4.3.2.4 Alignment: Populating the Distance Matrix

The next step in the alignment process is to fill in the remainder of the distance matrix

with values generated with these update rules, previously seen in Chapter 2.

D[i][j] =min {

D[i-1][j] + insCost,

D[i][j-1] + delCost,

D[i-1][j-1] + subCost

}

where

insCost = insertCost(hashedSourceStream[0])

delCost = deleteCost(hashedSourceStream[0])

subCost = substitutionCost(hashedTargetStream[i-1], hashedSourceStream[j-1])

Each entry A[i][j] in the distance matrix stores the lowest cost for aligning

hashedTargetStream[i] with hashedSourceStream[j]. There are 4 possible ways

of relating the two tuples.

1. hashedTargetStream[i] is an insertion - in the case of insertion, the cost of

50

Page 51: An Efficient Score Alignment Algorithm and its Applications

aligning hashedTargetStream[i] with hashedSourceStream[j] is equal to the

cost of aligning hashedTargetStream[i-i] with hashedSourceStream[j] plus

the cost of insertion.

2. hashedTargetStream[i] is a deletion - similar to the insertion case, for deletion,

the cost of aligning hashedTargetStream[i] with hashedSourceStream[j] is

equal to the cost of the subproblem of aligning hashedTargetStream[i] with

hashedSourceStream[j-i] plus the cost of deletion.

3. hashedTargetStream[i] is a substitution of hashedSourceStream[j] - in

the case of substitution, the cost of aligning hashedTargetStream[i] with

hashedSourceStream[j] is equal to the cost of the subproblem of aligning

hashedTargetStream[i-i] with hashedSourceStream[j-1] plus the cost of a

substitution of hashedSourceStream[j] for hashedTargetStream[i].

4. No change between the two tuples i.e. the two tuples are the same - same as above,

where substitutionCost(hashedTargetStream[i], hashedSourceStream[j]) is

0.

The method populateDistanceMatrix fills in all the entries of the distance matrix

using the rules and calculations listed above. It is important to note that in my

work, all changes are made relative to TargetStream. That is, whenever possible,

SourceStream is the stream that is being transformed into TargetStream. This

invariant holds because in OMR/MIDI correction, the approach and direction I take is

to change OMR into MIDI. Therefore, the MIDI stream is always the TargetStream

and the OMR stream is always the SourceStream. It is certainly possible to go in

either direction, but this paper will do it this way.

51

Page 52: An Efficient Score Alignment Algorithm and its Applications

4.3.2.5 Alignment: Backtrace to Find the Best Alignment and Create

the Changes List

Once the distance matrix has been completely filled in, we use a backtrace starting

from the bottom right hand corner of the matrix (i.e. A[i][j]) to find the path of least

of cost.

Starting from the value found at A[i][j], we look at the values directly above, to the

left, and diagonal in the up-left direction. That is, we look at the values in A[i-][j],

A[i][j-1], and A[i-1][j-1]. Among these three values, we choose the minimum value. The

combination of direction and value that we moved in tells us what kind of operation

was performed to align NoteHashWithReference tuples hashedSourceStream[i] and

hashedSourceStream[j]:

1. if the direction is up, then regardless of the value, this corresponds to an insertion

operation.

2. if the direction is left, then regardless of the value, this corresponds to an

insertion operation.

3. if the direction is diagonal up-left, and the value at A[i-1][j-1] is different from

the value at A[i][j], then this corresponds to a substitution operation.

4. if the direction is diagonal up-left, and the value at A[i-1][j-1] is the same as the

value at A[i][j], then this corresponds to a no-change operation.

We continue this backtrace until we end up back at A[1][1]. Since we are using a

global alignment technique, we should always end up back at this entry. If at the end

of backtrace we end up in another part of the distance matrix, the method throws an

AlignmentException.

Additionally, at every step of the backtrace, we create a ChangeTuple that holds refer-

ences to the original musical elements that are represented in the original SourceStream

52

Page 53: An Efficient Score Alignment Algorithm and its Applications

and TargetStream and the kind of operation (insertion, deletion, substitution, no-

change) that links them. Then we insert this ChangeTuple into the beginning of the

changes list.

This is all performed in the calculateChangesList method.

4.3.2.6 Post-Alignment: Measures of Similarity

After the backtrace to find the alignment, there is enough data to calculate some basic

metrics of similarity.

changesCount is a Counter object that provides a count of how many of each of the

four different change operations appear in the changes list.

similarityScore is a float between 0 and 1.0 that is the ratio of NoChange operations

to the total number operations in the changes list.

4.3.2.7 Post-Alignment: Visual Display of Alignment

The showChanges method provides a visual display of how the two streams are related

via the changes list. For each ChangeTuple in the list, this method goes back to

the initial reference and changes the color of the element in reference and adds in

an id number as a lyric to that element if the change operation is not a NoChange.

The color is determined by the type of change operation. Green corresponds to an

insertion, red to deletion, and purple to substitution. The id number is the index of

the ChangeTuple in the list.

The figures in Appendix A shows the visual display of alignment between the Violin 1

parts of an excerpt of Eine Kleine Nachtmusik, K.525 by W.A. Mozart. and a MIDI

recording of the same piece.

Since this method has a lot of overhead, it is by default set not to run. It can be set

53

Page 54: An Efficient Score Alignment Algorithm and its Applications

to run for every alignment by passing in the argument show=True.

4.4 Fixer

The Fixer is the last part of the OMR/MIDI music stream alignment and correction

system. Its developmental work can be found in fixer.py. There is a base class

called OMRMidiFixer that takes as input the changes list that the StreamAligners

outputs, as well as the original OMR and MIDI Streams.

Currently, there are two child classes, EnharmonicFixer and DeleteFixer that inherit

from OMRMidiFixer and provide different fixing services.

The EnharmonicFixer corrects a very specific type of error that appears in OMR,

classified by Byrd as a pitch error [1]. What happens in this scenario is that the MIDI

score has the correct absolute pitch of a note, but might be spelled enharmonically,

due to the fact that MIDI does not encode key signatures. The OMR score has been

scanned incorrectly— a sharp is mistaken for a natural or vice versa. An emergent

property of this scenario is that the OMR note and the MIDI note could be on different

lines or spaces but probably no more than a few steps apart. The EnharmonicFixer

accounts for the various possibilities that arise in this situation and corrects the pitch

of the OMR note by transposing the note 1 or 2 half steps and by adding or removing

accidentals.

The DeleteFixer was designed to fit the specifications of the OpenScore project [10].

The goal of the OpenScore project is to open-source music with open source software

(like music21!). OpenScore will use a combination of computer and human power

to digitize classical music scores and put them in the public domain. One idea is

that software can identify wrong recognized notes in a scanned score and mark or

delete the entire measure that that note is in and pass it off to a human corrector to

re-transcribe the entire measure. The DeleteFixer could be the computer power in

this method of score correction that OpenScore is using.

54

Page 55: An Efficient Score Alignment Algorithm and its Applications

Any additional child Fixer classes would inherit from the parent OMRMIDIFixer class

and also correct very specific problems. For example, in the StaccatoFixer skeleton

code, would look for a specific pattern in the changes list that would indicate that

the OMR process failed to recognize a passage that is marked as staccato. The

StaccatoFixer class along with other children Fixer classes would look for simi-

lar patterns in the ChangeTuples list and correct the OMR stream based on this

information.

4.5 Tradeoffs and Design Decisions

In this section, I will justify some of my design choices and discuss tradeoffs in my

decisions.

4.5.1 Modularity vs. Simplicity

4.5.1.1 A Variety of Settings and a Longer Setup

All of the modules have many different settings with various parameters. The variety

of settings is what makes the Hasher and Aligner so powerful— by changing just a

couple of settings, both are able to solve very different problems.

However, this necessarily lends itself to the system being more complex because each

module must be able to handle various combinations of settings and their edge cases.

It also requires the user to be careful when selecting initial settings.

One alleviation to this problem is providing default settings to all the modules. These

default settings are what I believe are good general settings for the most common uses

of the modules.

55

Page 56: An Efficient Score Alignment Algorithm and its Applications

4.5.1.2 Length Agnosticism

There are two instances in which I purposefully built the system to be able to handle

variable length input— in building NoteHash and NoteHashWithReference tuples and

in building the OMRMIDICorrector to be able to process a stream with any number

of parts. Both of these decisions make the entire system more powerful because the

system is able to process many more different types of streams this way, but it also

makes the hashing and aligning process more convoluted. Fortunately for the user,

all of this is hidden under the hood. There are likely some performance hits that are

associated with this added complexity, but not to the point of inoperability. More

demonstrations on timing are shown in section 5.3.

4.5.2 Performance vs. Space and Object Overhead

4.5.2.1 NoteHashWithReference

The NoteHashWithReference object contains a reference to the original stream object.

Streams and their elements are complex objects, so NoteHashWithReference objects

have much more overhead in storage than their counterparts, NoteHash objects.

However, in setting up a Hasher object, the user has the decision to use either one.

4.5.2.2 Creating One Hasher and Many Aligners

During the correction process, only a single Hasher object is created and used and

reused to generate hashes for every part in the stream passed in. There was the option

of having Hasher object be one-time-use objects, but I chose to have the Hasher be

able to operate on multiple streams.

On the other hand, a new StreamAligner object is generated for every pair of parts

in the OMR and MIDI streams that are passed in.

56

Page 57: An Efficient Score Alignment Algorithm and its Applications

Between generating NoteHashWithReference objects for every hashed stream element

and StreamAligner objects for every pair of streams, there is a lot of space overhead

taken up in even just a single alignment and correction process. However, based

on experimental findings in which I tried to create these objects in C, perform the

comparisons in C, and then translate the results into Python, I concluded that the

limiting factor of the process was not in Pythonic object creation but rather the

comparison process that calculates Levenshtein edit distance. The results and findings

were discussed in more detail in section 3.3.

4.5.3 Manual User Input

The OMR and MIDI streams that are used as input to the system come from somewhere.

I have left the onus on the user to source .mid files and find scanned scores. I also

have left it to the user to use some kind of OMR tool to convert the scanned score and

the MIDI file into a musicXML form. Once in musicXML form, the user can use the

music21 library to parse the OMR and MIDI and convert them into music21 streams.

At the time of that this thesis was written, music21’s MIDI parsing was having issues

with quantization of streams, so I used museScore’s MIDI parsing functions instead.

I leave this as a task to the user instead of automating it because with practice,

humans can be much more precise and efficient with this process, and the necessity of

a (probably large and complicated) computer program to do the same thing decreases.

57

Page 58: An Efficient Score Alignment Algorithm and its Applications

58

Page 59: An Efficient Score Alignment Algorithm and its Applications

Chapter 5

Examples and Results

This chapter will describe the results of my work using an excerpt of of W. A. Mozart’s

Eine Kleine Nachtmusik, K.525- I.Allegro (referred to by just “K.525” from here

onwards) as an example. It will also use an excerpt of W. A. Mozart’s String Quartet

No.7 in E-flat major, K.160- I.Allegro (referred to by just “K.160” from here onwards)

as an example where K.525 doesn’t fit the specifications. This chapter will also look

at timing metrics of the system.

5.1 Eine Kleine Nachtmusik, K.525- I.Allegro, W.

A. Mozart

This section will go through an example of how the entire OMRMIDICorrector system

works on W. A. Mozart’s Eine Kleine Nachtmusik, K.525- I.Allegro.

59

Page 60: An Efficient Score Alignment Algorithm and its Applications

Figure 5-1: The first page of the scanned copy of the score found on IMSLP.

5.1.1 Musical Properties of K.525

I chose K.525 as the example piece for its many music properties that make it a

difficult but interesting piece to align.

60

Page 61: An Efficient Score Alignment Algorithm and its Applications

5.1.1.1 Bass and Cello Doubling

In the original scanned score, there are only four parts because the bass part doubles

the cello part one octave lower. However, in the MIDI encoding, there are five separate

voices. Thus, in the preprocessing step of the OMRMiDICorrector, it would be ideal

for the system to recognize this. Otherwise, in instances where the OMRMIDICorrector

could not pair up the parts between the input OMR stream and the input MIDI

stream, there would be an error thrown, and alignment and correction would not

happen.

5.1.1.2 Tremolos

Starting in measure 5 in the original scanned score, the second violin has tremolo

notes, instead of repeated 16th notes written out. The OMR parser in SmartScore

is not robust enough to recognize the slashes across the stem of the note as tremolo

markings and instead interprets it as a single note. The MIDI protocol has no way of

indicating tremolos, so the second violin’s notes in measure 5 are encoded as regular

16th notes. The Aligner is robust enough to align tremolo measures with non-tremolo

measures even though there is a huge discrepancy in the number of notes present in

the measure.

5.1.2 Preparing the Raw Input

The basic raw input of the system is an OMR score and a MIDI file. A scanned copy

of the score was found in IMSLP [7], the source for much free public domain sheet

music. The MIDI file was sourced from the Yale MIDI Archive.

I used SmartScore’s built-in OMR tool to convert the scanned score of K.525 into an

ENF file. Then I exported the post-OMR file as a musicXML file.

61

Page 62: An Efficient Score Alignment Algorithm and its Applications

Similarly for the K.525 MIDI file, I used SmartScore to parse the original file and

exported it as a musicXML file.

The last step of preparing the raw input is to parse the musicXML files with the

music21 library so that they are Stream objects. This can be done by passing in the

musicXML filepaths to the parse method in the converter class. We will call these

two parsed streams k525midiStream and k525omrStream.

from music21 import *

k525MidiFilepath = "pathto/k525mvmt1/miditoxml.xml"

k525OmrFilepath= "pathto/k525mvmt1/omrtoxml.xml"

k525midistream = converter.parse(k525MidiFilepath)

k525omrstream = converter.parse(k525OmrFilepath)

5.1.3 Running OMRMIDICorrector on K.525

After the raw input has been massaged into Stream objects, the aligning process is

simple. We create an instance of an OMRMIDICorrector object with the two streams

as parameters. For the purposes of this example, we will use the default hasher that

is built into the OMRMIDICorrector class. We will set the debugShow to True so

that we can visualize the alignment between pairwise streams. Finally, we will call

processRunner which will hash and align the parts in the stream. For now, we will

manually create the Fixer objects for stream fixing. The aligner is able to correctly

identify that the repeated 16th notes

from music21 import *

OMC = alpha.analysis.omrMidiCorrector

k525omc = OMC.OMRMIDICorrector(k525midistream, k525omrstream)

k525omc.debugShow = True

k525omc.processRunner()

The outputs of the showChanges function are all in Appendix B

62

Page 63: An Efficient Score Alignment Algorithm and its Applications

5.1.3.1 A Closer Look at Specifics of processrunner

A lot of magic happens within processRunner, all of which was described at a high

level in sections 4.1.1.1 - 4.1.1.4. Here are some salient details of the process in the

context of K.525 that I think deserve notice:

• Bass Doubles Cello Part - The OMR score has only four parts, whereas the

MIDI score has 5. The checkBassDoublesCello method in OMRMIDICorrector

is able to correctly identify this and thus does not reject the input on the basis

of having a mismatching number of parts in the stream.

• Rhythmic Fault Tolerance: Beginning Rests - The first measure in the OMR

score is a scanning and parsing error. The Aligner is robust enough to identify

the beginning rest as a insertion error.

• Rhythmic Fault Tolerance: Propagating Rhythmic Shifts - By measure 5 in the

Violin 2 part, we see that the OMR score has introduced a 16th note shift in the

rhythm which continues for a few measures.

• Notation Fault Tolerance: Tremolos - By measure 5 of the Violin 2 part and by

measure 10 of the Viola part, the repeated 16th notes get notated with tremolo

slashes in the original OMR score. Of course, the MIDI protocol doesn’t have

a way of representing tremolo, so the notes are written out in long form. The

corrector is able to match the written out repeated notes in the MIDI to the

(incorrectly) scanned corresponding notes in the OMR.

5.1.4 Example 1: Similarity Metrics in K.525

Recall that the similarityScore is the ratio of NoChange operations to the total

number of operations in the changes list. For each pair of streams in K.525, the

similarityScore and changesCount are listed below:

63

Page 64: An Efficient Score Alignment Algorithm and its Applications

similarityScore changesCount

Violin 1 0.51 {NoChange: 83, Ins: 30, Del: 10, Sub: 39}

Violin 2 0.30 {NoChange: 74, Ins: 125, Del: 9, Sub: 38}

Viola 0.72 {NoChange: 97, Ins: 21, Del: 9, Sub: 8}

Cello/Bass 0.87 {NoChange: 107, Del: 11, Sub: 5}

Table 5.1: similarityScore and changesCount for every pair of aligned streams

A visual examination of the pairs of streams suggests that these numbers and counts

look correct. The Cello/Bass parts have the fewest colorations to their notes and no

insertion changes at all. The least similar pair of parts is the Violin 2, which has

multiple cases of the tremolo problem described above as well as propagating rhythmic

shifts.

5.2 String Quartet No.7 in E-flat major, K.160-

I.Allegro, W. A. Mozart

The previous section showed the alignment between parts in Eine Kleine Nacht-

musik. In this section will use another piece by Mozart to demonstrate how the

EnharmonicFixer works on correcting an excerpt of the Violin 1 part of K.160. This

OMR score was also sourced from IMSLP [8].

5.2.1 Running EnharmonicFixer on K.160, Violin 1

After prepping the raw input files of K.160 in the same way as detailed above and run-

ning that through an OMRMIDICorrector instance, we will create an EnharmonicFixer

instance using just the Violin 1 OMR stream and MIDI stream and the associated

changes list.

64

Page 65: An Efficient Score Alignment Algorithm and its Applications

from music21 import *

k160MidiFilepath = "pathto/k160mvmt1/miditoxml.xml"

k160OmrFilepath = "pathto/k160mvmt1/omrtoxml.xml"

k160midiStream = converter.parse(k160MidiFilepath)

k160omrStream = converter.parse(k160OmrFilepath)

k160omc = OMRMIDICorrector(k160midiStream, k160omrStream)

k160omc.processRunner()

V1Changes = k160omc.changes[0]

V1Midi = k160omc.midiParts[0]

V1Omr = k160omc.omrParts[0]

k160EF = fixer.EnharmonicFixer(V1Changes, V1Midi, V1Omr)

k160EF.fix()

k160omcViolin1OmrStream.show()

In Appendix C, the first two figures show the alignment visualization of the Violin 1

part. The last figure shows the corrected OMR score after it has been changed by the

EnharmonicFixer.

5.2.2 Analysis of Correction

Visual inspection of figures C-1 and C-2 suggests that the OMR failed to pick up the

appropriate key signature of the movement and that is what lends to large number of

changes in the first 10 or so measures. Inspection of figure C-3 shows that many of

the pitch errors have been corrected using the EnharmonicFixer. Of course, the most

ideal scenario would be to recreate the key signature, but having the correct pitch is a

good start.

65

Page 66: An Efficient Score Alignment Algorithm and its Applications

5.3 Results: Timing Tests

This section will evaluate the runtime on each of the three major modules of the system

the Hasher, Aligner, andFixer as well as the entire system, OMRMIDICorrector which

is a combination of some number of instances of the three major modules. It will also

discuss how the system would scale to different sized inputs.

For the following calculations, assume that the length of the input MIDI stream and

the OMR stream are about the same, n, and that the number of parts in each of the

streams is p.

5.3.1 Runtime Analysis: Hasher

Setup of the Hasher object is in O(n) time. Building the list finalEltsToBeHashed

requires scanning the input stream in full. Recall that the tupleList is a list of the

stream element properties that will be hashed; call its length t. The runtime of the

hashStream method is then O(k · n).

Thus, a single use of the Hasher is in O(n) +O(k · n) = O(n) time for small k. This

is a reasonable assumption because the number of notes in a stream is likely to be far

greater than the number of possible qualities of stream elements that can be hashed.

5.3.2 Runtime Analysis: Aligner

Setup of a StreamAligner object takes O(n · n) = O(n2) time, dominated by

the populateDistanceMatrix function. Traceback in the calculateChangesList

method takes O(n + n) = O(n) time. The length of the changes list is then also

O(n). The runtime of showChanges is proportional to the length of the changes list,

calculated above to be O(n)

A single use of a StreamAligner object is in O(n2) +O(n) +O(n) = O(n2) time.

66

Page 67: An Efficient Score Alignment Algorithm and its Applications

5.3.3 Runtime Analysis: Fixer

The OMRMIDIFixer and the EnharmonicFixer both require negligible setup. The fix

function’s runtime is proportional to the length of the changes list input, shown above

to be O(n).

A single use of a Fixer object is in O(n) time.

5.3.4 Runtime Analysis: OMRMIDICorrector

For a single instance of an OMRMIDICorrector with OMR and MIDI streams of length

n and p parts, there are 2 · p textttHasher objects, p Aligner objects, and p Fixer

objects created. In total this comes out to a time of:

2 · p ·O(n) + p ·O(n2) + p ·O(n)

= p ·O(n2) + 3 ·O(n)

For small p, this collapses to O(n2).

5.3.5 Empirical Results

Using the the first movement of K.160, I timed some results using the % timeit

functionality of iPython. First I tested the major modules on a single stream, the

Violin 1 part of K.160. Then I tested the OMR/MIDI Correction system on all of the

K.160.

67

Page 68: An Efficient Score Alignment Algorithm and its Applications

5.3.6 %timeit: Hasher

After setting up a Hasher, % timeit reported an average time of 621 ms per call to

hash the MIDI stream and average time of 617 ms per call to hash the OMR stream.

The length of the hashed MIDI stream was 662 and and the length of the hashed

OMR stream was 669.

5.3.7 %timeit: Aligner

After setting up a StreamAligner object using the hashed MIDI stream and the

hashed OMR stream, I ran % timeit on the align function and got an average of

2.39 s per call. Since the align function itself hashes input streams if not already

hashed, I passed in the hashed streams. This produced more accurate measurement

of the time for the alignment process, separate from the hashing process.

5.3.8 %timeit: Fixer

I set up a EnharmonicFixer object using the changes list from the aligning step and

the original K.160 Violin 1 OMR and MIDI streams. Running % timeit on the fix

function returned an average of 12.7 ms per call.

5.3.9 %timeit: OMR/MIDI Corrector

After testing the timing of each of the major modules, I tested the timing of the

OMR/MIDI Correction system on the entire first movement of K.160. The average

time to hash, align, and fix the four instrumental parts in K.160 was 12.94 s.

An entire run of the OMR/MIDI Corrector system on K.160 require 8 calls to the

Hasher, 4 calls to the Aligner, and 4 calls to the Fixer. Using the empirical numbers

68

Page 69: An Efficient Score Alignment Algorithm and its Applications

from above, this gives an estimate of 14.56 seconds, which is in the range of the

12.94 seconds that %timeit gives us. The speedup in the empirical time versus the

calculated time likely comes from the fact that there is considerable setup that was

reused in calling the OMRMIDICorrector once instead of its constituent modules.

5.3.10 Scalability

If a single movement of a four-part Mozart quartet takes on the order of tens of seconds

to hash, align, and fix, then most multi-movement pieces for chamber ensembles should

take on the order of minutes to hash, align, and fix. Larger orchestral works might

take on the order of tens of minutes to hash, align, and fix. Especially for the first

system of its type, these preliminary timings of the OMRMIDICorrector have good

implications.

The bottleneck of the system lies in the Aligner. The method implemented in my

system is a basic string alignment algorithm, and there is considerable room for

speedup using techniques such as The Four Russian Algorithm [3].

5.3.11 Implications of Free Music

The lasting and tangible results of this research will hopefully be in helping providing

more free and high-quality musical scores to the public. With ISMLP alone having

over 350,000 scores that are freely available, there is a lot of potential for my work to

liberate otherwise sub-quality scanned musical scores and turn them into digital scores

that can be conveniently analyzed, replicated, and distributed. It’s my hope that this

work will be able to facilitate musical research, provide individuals and groups with

easy access to practice material, and help bring more music to all of our lives.

69

Page 70: An Efficient Score Alignment Algorithm and its Applications

70

Page 71: An Efficient Score Alignment Algorithm and its Applications

Appendix A

Alignment Visualization of K.525

71

Page 72: An Efficient Score Alignment Algorithm and its Applications

Figure A-1: Visual display of changes in Violin 1 of target (MIDI) stream.

72

Page 73: An Efficient Score Alignment Algorithm and its Applications

Figure A-2: Visual display of changes in Violin 1 of source (OMR) stream.

73

Page 74: An Efficient Score Alignment Algorithm and its Applications

74

Page 75: An Efficient Score Alignment Algorithm and its Applications

Appendix B

Alignment Visualization of K.525

75

Page 76: An Efficient Score Alignment Algorithm and its Applications

Figure B-1: Post-alignment visualization of an excerpt of the MIDI Violin 1 part in K.525-I.Allegro.

76

Page 77: An Efficient Score Alignment Algorithm and its Applications

Figure B-2: Post-alignment visualization of an excerpt of the OMR Violin 1 part in K.525-I.Allegro.

77

Page 78: An Efficient Score Alignment Algorithm and its Applications

Figure B-3: Post-alignment visualization of an excerpt of the MIDI Violin 2 part in K.525-I.Allegro.

78

Page 79: An Efficient Score Alignment Algorithm and its Applications

Figure B-4: Post-alignment visualization of an excerpt of the OMR Violin 2 part in K.525-I.Allegro.

79

Page 80: An Efficient Score Alignment Algorithm and its Applications

Figure B-5: Post-alignment visualization of an excerpt of the MIDI Viola part in K.525-I.Allegro.

80

Page 81: An Efficient Score Alignment Algorithm and its Applications

Figure B-6: Post-alignment visualization of an excerpt of the OMR Viola part in K.525-I.Allegro.

81

Page 82: An Efficient Score Alignment Algorithm and its Applications

Figure B-7: Post-alignment visualization of an excerpt of the MIDI Cello part in K.525-I.Allegro.

82

Page 83: An Efficient Score Alignment Algorithm and its Applications

Figure B-8: Post-alignment visualization of an excerpt of the OMR Cello and Bass part inK.525.

83

Page 84: An Efficient Score Alignment Algorithm and its Applications

84

Page 85: An Efficient Score Alignment Algorithm and its Applications

Appendix C

Alignment Visualization of Violin 1

part in K.160

85

Page 86: An Efficient Score Alignment Algorithm and its Applications

Figure C-1: Post-alignment visualization of the MIDI Violin 1 part in K.160.

86

Page 87: An Efficient Score Alignment Algorithm and its Applications

Figure C-2: Post-alignment visualization of the OMR Violin 1 part in K.160.

87

Page 88: An Efficient Score Alignment Algorithm and its Applications

Figure C-3: Post-enharmonic and pitch fixing visualization of the OMR Violin 1 part inK.160.

88

Page 89: An Efficient Score Alignment Algorithm and its Applications

Bibliography

[1] Donald Byrd and Jakob Grue Simonsen. Towards a Standard Testbed for OpticalMusic Recognition: Definitions, Metrics, and Page Images. Journal of New Music

Research, 44.3:169– 195, 2015.

[2] Maura Church and Michael Scott Cuthbert. Improving Rhythmic Transcriptionsvia Probability Models Applied Post-OMR. In Hsin-Min Wang, Yi-Hsuan Yang,and Jin Ha Lee, editors, Proceedings of the 15th Conference of the International

Society for Music Information Retrieval, pages 643– 647. ISMIR, 10, 2014.

[3] Algorithms for Computational Biology. Sequence Alignment andDynamic Programming, Lecture notes for 6.096. https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-096-algorithms-for-computational-biology-spring-2005/lecture-notes/lecture5_newest.pdf.

[4] Walter B Hewlett. A Base-40 Number-line Representation of Musical PitchNotation. Musikometrika, 4:1–14, 1992.

[5] IMSLP/Petrucci Library Free Public Domain Sheet Music. http://imslp.org/.Accessed: 2015-10-10.

[6] Peter Joslin. Music Retrieval based on Melodic Similarity. PhD thesis, UniversiteitUtrecht, 2007.

[7] Wolfgang Amadeus Mozart. Eine Kleine Nachtmusik, K. 525.http://ks.petruccimusiclibrary.org/files/imglnks/usimg/5/57/IMSLP01776-Mozart_EineKleineNachtmusik_Score.pdf. Accessed: 2017-05-23.

[8] Wolfgang Amadeus Mozart. String Quartet No.7 in E-flat major, K.160/159a.http://ks.petruccimusiclibrary.org/files/imglnks/usimg/f/f1/IMSLP64127-PMLP05214-Mozart_Werke_Breitkopf_Serie_14_KV160.pdf.Accessed: 2017-05-23.

89

Page 90: An Efficient Score Alignment Algorithm and its Applications

[9] Saul B. Needleman and Christian D. Wunsch. A general method applicable tothe search for similarities in the amino acid sequence of two proteins. Journal of

Molecular Biology, 48(3):443– 453, March 1970.

[10] Openscore. https://musescore.org/en/user/57401/blog/2017/01/11/introducing-openscore. Accessed: 2017-05-22.

[11] An excerpt of a multiple sequence alignment of tmem66 proteins.https://commons.wikimedia.org/wiki/File:An_excerpt_of_a_multiple_sequence_alignment_of_TMEM66_proteins.png. Accessed: 2017-05-10.

[12] Ana Rebelo, Ichiro Fujinaga, Filipe Paszkiewicz, Andre R. S. Marcal, CarlosGuedes, and Jaime S. Cardoso. Optical Music Recognition - State-of-the-Artand Open Issues. International Journal of Multimedia Information Retrieval,1(3):173–190, Feb 2012.

[13] David Temperley. An Algorithm for Harmonic Analysis. Music Perception: An

Interdisciplinary Journal, 15.1:31– 68, 1997.

[14] Vladimir Viro. Peachnote: Music Score Search and Analysis Platform. In AnssiKlapuri and Leider Kolby, editors, Proceedings of the 12th Conference of the

International Society for Music Information Retrieval, pages 359– 362. ISMIR,10, 2014.

[15] Christopher Wm. White. Some statistical properties of tonality, 1650-1900. PhDthesis, Yale University, 2013.

90