iscotective - electrical engineering and computer science · 2011-04-13 · iscotective katie...

iscotective Katie Bouman, Brad Campbell, Michael Hand, Tyler Johnson, Joe Kurleto An optical music recognition project developed by EECS 452 DSP Design Laboratory Design Expo Winter 2011 INTRODUCTION When learning to read sheet music, many students have diffi- culty when they do not know how the music should sound. The Discotective, an embedded music recognition system, presents a solution to this problem through image processing. A user simply takes a picture of sheet music, and after a series of processing steps, the Discotective plays the song. H A R DWA RE The Discotective is built on an Altera DE2 FPGA. A 50 MHz softcore processor has been downloaded to the FPGA, making the system compatible with C programs. In addition, a 5 megapixel camera from Terasic captures images, which are used for processing and can be displayed on a VGA monitor. The final audio is output to speakers via the board’s audio codec. SYSTEM OVERVIEW DESIGN DETAILS I. Image Acquisition The Discotective first captures a 5 megapixel image, which is displayed on the VGA in real-time. II. Preprocessing Next the image is prepared for further processing. To ease memory and computational demands, the image is bina- rized. A simple transformation is also applied to correct for lens distortions. Because of the embedded system’s limited capabilities, more robust MATLAB prototype algorithms have been streamlined to work well on the FPGA. III. Segmentation After preprocessing, a top-down approach is used to isolate smaller symbols. Image projections are computed to find horizontal lines and detect individual staffs. Afterward, stafflines are removed, followed by stemmed notes, leaving an image of unconnected musical symbols. These symbols are then segmented using connected-component analysis. Original grayscale image Preprocessed image Altera DE2 FPGA Terasic 5 megapixel camera Initial staff: After line removal: Remaining symbols: Unsegmented staffs Horizontal projection of image PERFORMANCE Overall, the Discotective performs well with simple music, such as that played by a middle school band. Due to the limits of our design, however, many symbols are difficult to classify. Slurs or ties connected to notes, for example, cannot be segmented properly with connected-component analysis. More complicated symbols, such as chords, repeat signs, accents, or text will not be properly recognized by the classifier. The hardware implementation constraint introduces further limitations. The slow softcore processor prohibits the use of more advanced binarization and deskew preprocessing steps, which reduce the accuracy of later algorithms. Nevertheless, the device still performs well under proper conditions. CONTACT Group members may be reached at [email protected]. DESIGN DETAILS IV. Classification The Discotective takes a tree-based approach to classification. Stemmed notes, which are easy to detect, are identified first. Then additional symbols, such as accidentals, dots, and rests are classified by extracting certain features. V. Output The Discotective produces sound output using direct digital synthesis. In essence, a sinusoidal look-up table is used to pro- duce pure tones. The frequency of each tone can be tuned de- pending on the amount of samples used in each sinusoidal period. Additional tonal qualities can be achieved by adding harmonic frequencies or adjusting the tone’s amplitude at spe- cific times. Some useful features: • Symbol height • Symbol width • Presence of vertical lines • Proximity to notes • Proximity to stafflines • Black to white pixel ratio

Upload: others

Post on 28-Jun-2020

2 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: iscotective - Electrical Engineering and Computer Science · 2011-04-13 · iscotective Katie Bouman, Brad Campbell, Michael Hand, Tyler Johnson, Joe Kurleto EECS 452 An optical music

iscotectiveKatie Bouman, Brad Campbell, Michael Hand, Tyler Johnson, Joe Kurleto

An optical music recognition project developed byEECS 452DSP Design Laboratory Design Expo

Winter 2011

INTRODUCTIONWhen learning to read sheet music, many students have diffi-culty when they do not know how the music should sound. The Discotective, an embedded music recognition system, presents a solution to this problem through image processing. A user simply takes a picture of sheet music, and after a series of processing steps, the Discotective plays the song.

HARDWAREThe Discotective is built on an Altera DE2 FPGA. A 50 MHz softcore processor has been downloaded to the FPGA, making the system compatible with C programs. In addition, a5 megapixel camera from Terasic captures images, which are used for processing and can be displayed on a VGA monitor. The final audio is output to speakers via the board’s audio codec.

SYSTEM OVERVIEW

DESIGN DETAILSI. Image Acquisition

The Discotective first captures a 5 megapixel image, which is displayed on the VGA in real-time.

II. Preprocessing

Next the image is prepared for further processing. To ease memory and computational demands, the image is bina-rized. A simple transformation is also applied to correct for lens distortions. Because of the embedded system’s limited capabilities, more robust MATLAB prototype algorithms have been streamlined to work well on the FPGA.

III. Segmentation

After preprocessing, a top-down approach is used to isolate smaller symbols. Image projections are computed to find horizontal lines and detect individual staffs. Afterward, stafflines are removed, followed by stemmed notes, leaving an image of unconnected musical symbols. These symbols are then segmented using connected-component analysis.

Original grayscale image Preprocessed image

Altera DE2 FPGA Terasic 5 megapixel camera

Initial staff:

After line removal:

Remaining symbols:

Unsegmented staffs Horizontal projection of image

PERFORMANCEOverall, the Discotective performs well with simple music, such as that played by a middle school band. Due to the limits of our design, however, many symbols are difficult to classify. Slurs or ties connected to notes, for example, cannot be seg-mented properly with connected-component analysis. More complicated symbols, such as chords, repeat signs, accents, or text will not be properly recognized by the classifier.

The hardware implementation constraint introduces further limitations. The slow softcore processor prohibits the use of more advanced binarization and deskew preprocessing steps, which reduce the accuracy of later algorithms. Nevertheless, the device still performs well under proper conditions.

CONTACTGroup members may be reached at [email protected].

DESIGN DETAILS IV. Classification

The Discotective takes a tree-based approach to classification. Stemmed notes, which are easy to detect, are identified first. Then additional symbols, such as accidentals, dots, and rests are classified by extracting certain features.

V. Output

The Discotective produces sound output using direct digital synthesis. In essence, a sinusoidal look-up table is used to pro-duce pure tones. The frequency of each tone can be tuned de-pending on the amount of samples used in each sinusoidal period. Additional tonal qualities can be achieved by adding harmonic frequencies or adjusting the tone’s amplitude at spe-cific times.

Some useful features:• Symbol height• Symbol width• Presence of vertical lines

• Proximity to notes• Proximity to stafflines• Black to white pixel ratio

Welkom bij Restaurant De Kluis Peter Bouman Namens Peter Bouman en Lie Tsai :

EECS 452 { Lecture 1EECS 452 { Lecture 1 Digital Signal Processing Laboratory Course Description (from catalog) EECS 452. Digital Signal Processing Design Laboratory Prerequisite:

EECS 452 – Lecture 2users.ece.northwestern.edu/~memik/courses/452/lecturenotes/Lec2.… · EECS 452 – Lecture 2 Instructor: Gokhan Memik EECS Dept., Northwestern University

Optimization Models - EECS 127 / EECS 227AT · 2020. 4. 30. · Optimization Models EECS 127 / EECS 227AT Laurent El Ghaoui EECS department UC Berkeley Spring 2020 Sp20 1/30

EECS 452 – Lecture 3 Pipelining and Interruptsusers.ece.northwestern.edu/~memik/courses/452/lecturenotes/Lec3.pdfEECS 452 – Lecture 3 Pipelining and Interrupts Instructor: Gokhan

EECS 452 Lab 3—Introduction to the DE2-70 FPGA board · EECS 452 Lab 3—Introduction to the DE2-70 FPGA board . ... lecture notes. 5 and 6 ... In lab 2 we used DDS to generate

Bouman lubjuhn iw1

Fall 2007 Kurt Metzger and Chih-Wei Wang - EECS · EECS 452 TextT Fall 2007 Kurt Metzger and Chih-Wei Wang October 16, 2007 EECS 452 Press, EECS Department, College of Engineering,

Doc. 9830 AN-452 AN-452

Bouman Luxe Magazine