chapter 1dmush/term4document.docx · web view3.3.3 field programmable gate array (fpga) based fish...

56
DEPARTMENT OF COMPUTER SCIENCE FISH IDENTIFICATION SYSTEM By Diego Mushfieldt A mini-thesis submitted in partial fulfilment of the requirements for the degree of BSc. Honours Supervisor: Mehrdad Ghaziasgar Co-Supervisor: James Connan Date: 2011-11-02

Upload: nguyenthien

Post on 18-May-2018

214 views

Category:

Documents


1 download

TRANSCRIPT

DEPARTMENT OF COMPUTER SCIENCE

FISH IDENTIFICATION SYSTEM

By

Diego Mushfieldt

A mini-thesis submitted in partial fulfilment of the requirements for the degree of

BSc. Honours

Supervisor: Mehrdad Ghaziasgar

Co-Supervisor: James Connan

Date: 2011-11-02

Acknowledgements

I would like to take this opportunity to thank my parents for their constant motivation and inspiration and for granting me the opportunity to study at the University of the Western Cape. I would also like to thank those dear to my heart including my family for always having the patience and understanding when I could not always be in their presence. Last, but certainly not least, I want to thank my supervisors, Mehrdad Ghaziasgar and James Connan, for believing in me and giving me the confidence I needed to keep on working so hard at this project. Your dedication is truly an inspiration to me and to other students at UWC.

ABSTRACT

Aquariums currently display a large number of various kinds of fish and visitors regularly desire to know more about a particular kind of fish. Visitors can currently obtain this information by asking either an expert or by scanning through the documentation in and around the aquarium. However, information may not be readily available at times. Therefore, it is desirable to create a system that allows such information to be readily available in an interactive manner. The project aims to develop a system that uses a video stream of a wide range of fish as its input. The user then clicks on a particular fish and the system then classifies the fish and displays information about the fish accordingly. The system aims to give users an enjoyable and educational experience by allowing them to interact with the system via the click of a mouse. This project is scalable enough to incorporate functionality such as a touch screen in order to

improve interaction and enhance the user experience.

Table of Contents

CHAPTER 1........................................................................................................................................6

INTRODUCTION.........................................................................................................................6

1.1 Computer Vision and Image Processing................................................................6

1.2 OpenCV (Open Computer Vision).............................................................................6

1.3 Current Research............................................................................................................7

CHAPTER 2........................................................................................................................................8

USER REQUIREMENTS..........................................................................................................8

2.1 User’s view of the problem.........................................................................................8

2.2 Description of the problem..........................................................................................8

2.3 Expectations from the software solution...............................................................9

2.4 Not expected from the software solution..............................................................9

CHAPTER 3......................................................................................................................................10

REQUIREMENTS ANALYSIS...............................................................................................10

3.1 Designer’s Interpretation and Breakdown of the Problem...........................10

3.2 Complete Analysis of the Problem..........................................................................11

3.3 Current Solutions..........................................................................................................12

3.4 Suggested Solution......................................................................................................14

CHAPTER 4......................................................................................................................................15

USER INTERFACE SPECIFICATION (UIS)....................................................................15

4.1 The complete user interface.....................................................................................15

4.2 The input video frame.................................................................................................16

4.3 How the user interface behaves..............................................................................16

CHAPTER 5......................................................................................................................................17

HIGH LEVEL DESIGN (HLD)..............................................................................................17

5.1 Description of Concepts.............................................................................................18

5.2 Relationships between objects.................................................................................19

5.3 Subsystems of HLD......................................................................................................19

5.4 Complete Subsystem...................................................................................................20

CHAPTER 6......................................................................................................................................21

LOW LEVEL DESIGN (LLD).................................................................................................21

6.1 Low Level Description of Concepts........................................................................21

6.2 Detailed Methodology.................................................................................................22

CHAPTER 7......................................................................................................................................32

TESTING AND RESULTS......................................................................................................32

CHAPTER 8......................................................................................................................................34

USER MANUAL........................................................................................................................34

8.1 Starting the System.....................................................................................................34

8.2 Load Video.......................................................................................................................35

CHAPTER 9......................................................................................................................................36

CODE DOCUMENTATION....................................................................................................36

CHAPTER 1INTRODUCTION

1.1 Computer Vision and Image Processing

Computer vision [1] is the study of techniques that can be used to make machines see. In this context ’see’ refers to a machine that is able to extract information from an image that is necessary to solve some task. The image can take many forms, such as video sequences or even views from multiple cameras. Image processing [2] is basically any kind of signal processing whereby the input for the processing is an image and the output is either another

image or a set of parameters related to the image.

1.2 OpenCV (Open Computer Vision)

OpenCV [3] is an open source computer vision library which is written in C and C++. OpenCV runs on the following platforms: Linux, Windows and MacOS. This library helps people to build complicated vision applications with its simple-to-use vision infrastructure. Its ease-of-use can be experienced when using its library which contains over 500 functions. OpenCV also contains a machine learning library since computer vision and machine learning go hand in hand, so it can be used to solve any machine learning problem.

1.3 Current Research

Information about specific fish within an aquarium is not always readily available at times. At the moment people can obtain

information either by scanning the documentation in the aquarium or ask an expert. Therefore, it is desirable to develop a system that provides instant information about specific fish in an interactive manner. The proposed system identifies a fish, using OpenCV’s libraries, by creating an image of the fish when the user clicks on it with a mouse. The image is processed using various algorithmic techniques and the necessary information is then displayed on the screen.

CHAPTER 2USER REQUIREMENTS

The following section describes the problem from the user’s point of view. It is critical to gather information from the user in order to produce a meaningful solution.

2.1 User’s view of the problem

The user requires the system to provide an easy mechanism for a user to select a fish (in this case via a click of a mouse) on a live or pre-recorded video stream. The system should be able to classify the fish, while it is in motion within the video stream, when it is clicked on. The system should also be capable of providing the user with information that is structured in a sensible manner and which is easy to understand. It is very important to consider how user friendly the system should be in order to present the information with clarity so that the provided information remains unambiguous.

2.2 Description of the problem

The main purpose of this project is to develop an interactive system that is capable of providing instant feedback about a particular fish which the user is interested in. The system should be able to assist in educating its users about different fish species by presenting certain facts about the particular type of fish that the user is interested in.

2.3 Expectations from the software solution

The system is expected to classify one fish at a time when the user clicks on it in the live video stream. The focus of this project is not only on classifying the fish, but also on the kind of information displayed to the user as well as the manner in which information is displayed.

2.4 Not expected from the software solution

The system is not expected to display information about more than one fish at the same time. Therefore, the system can only process and perform analysis on one fish at a time. Since one camera is used to capture the fish, the system can only process the fish in two dimensions with only one camera angle. Therefore the system is not expected to do its processing in three dimensions.

CHAPTER 3REQUIREMENTS ANALYSIS

The following section describes the problem from the designer’s perspective and uses the previous chapter (CHAPTER 2) as a starting point.

3.1 Designer’s Interpretation and Breakdown of the Problem

The aquarium hosts many visitors each year and there are various fish on display. However, the viewers are not always able to obtain instant information of specific fish which they are interested in learning more about. The input to the final system is a live or pre-recorded video stream. Using a live video feed rather than a recorded video file is ideal and more practical but difficult to implement in terms of the efficiency of algorithms and number of frames per second. A camera is used to capture/record the fish while it is swimming. This allows the user to observe the fish and decide which particular fish he/she is interested in learning more about. The user interacts with the system by moving the mouse cursor over a specific fish and by clicking on it. The location of the click is used to determine which fish was clicked on by the user and an image of that fish is created. The system uses image processing techniques and functions from the OpenCV (Open Computer Vision) libraries in order to classify the fish

accordingly. The system then displays the necessary output on the screen. The difficulty lies in processing the image to determine what fish was clicked on and this should be done fast enough to ensure real-time processing.

3.2 Complete Analysis of the Problem

3.2.1 Recording the fish in real-time

A camera is used in order to record the fish swimming in the fish tank. This input from the camera is used by the system in order to display the live video feed of the fish in two dimensions on a screen, and to enable the user to select a particular fish via the click of a mouse.

3.2.2 Processing the image of the selected fish

After the user clicks on a specific fish, the location of the click is used by the system in order to determine which fish was clicked on.

The region of interest (ROI) will be of importance in order to achieve this goal.3.2.3 Displaying information to the user

The system is not required to display critical biological features of each fish. However, the information should be precise enough in order to educate the user but concise enough to ensure that the user does not feel overwhelmed by too much information. Therefore, the system only displays the fish type.

3.3 Current Solutions

There are a few systems which are similar to this one. They are used to do fish surveys as well as counting fish in order to protect marine ecosystems. However, there is currently no system that exists which solves the exact same problem as the proposed system. The similar systems mentioned earlier do make use of similar functions and techniques that are necessary to solve the problem such as the Fish

Identification System. Some of these systems are described below.

3.3.1 Real-time fish detection based on improved adaptive background [4]

This system is a kind of fish behaviour monitoring system. The system proposes a new approach to update the background, which is based on frame difference and background difference, in order to detect the fish in a real-time video sequence. The system combines the background difference and frame difference to update the background more correctly and completely using shorter computation times.

3.3.2 Recognizing Fish in Underwater video [5]

This system uses a deformable template object recognition method for classifying fish species in an underwater video. The method used can be a component of a system that automatically identifies fish by species, improving upon

previous works, which only detect and track fish. In order to find globally optimal correspondences between the template model and an unknown image, Distance Transforms are used. Once the query images have been transformed into estimated alignment with the template, they are processed to extract texture properties.

3.3.3 Field Programmable Gate Array (FPGA) Based Fish Detection Using Haar Classifiers [6]

The quantification of abundance, size and, distribution of fish is critical to properly manage and protect marine ecosystems and regulate marine fisheries. This system is designed to automatically detect fish using a method based on the Viola and Jones Haar-like feature object detection method on a field programmable gate array (FPGA). This method generates Haar classifiers for different fish species by making use of OpenCV’s Haar Training Code which is

based on the Viola-Jones detection method. This code allows a user to generate a Haar classifier for any object that is consistently textured and mostly rigid.

3.4 Suggested Solution

The system will work effectively at classifying various types of fish. The suggested solution is easy to modify such that additional functionality can be added to it when necessary. It is also cost-effective since only one camera is used as well as open-source software (OpenCV).

CHAPTER 4USER INTERFACE SPECIFICATION

(UIS)

The following section describes exactly what the user interface is going to do, what it looks like and how the user interacts with the program.

4.1 The complete user interface

The complete user interface is a Graphical User Interface (GUI). Text commands are not used by the user to interact with the system. The figure below shows the user interface as it appears to the user.

Figure 1: User Interface

4.2 The input video frame

Once the system starts running, the video feed will be displayed to the user within a window on the screen as shown above. The user can now click on any fish within this window.

4.3 How the user interface behaves

The system will display the video feed once it is executed. It then waits for input from the user via the click of a mouse. If the user clicks on a fish, the system will respond by displaying an additional window that shows the classification of the fish.

CHAPTER 5HIGH LEVEL DESIGN (HLD)

In this section a HLD view of the problem will be applied. Since the programming language of choice is C/C++; Object Orientated Analysis is not being applied. A very high level of abstraction of the system is constructed as we take a look and analyse the methodology behind the construction of the system.

Figure 2: Behaviour of User Interface

5.1 Description of Concepts

Consider the system objects and their corresponding descriptions in the following table:

OBJECT DESCRIPTION

Ffmpeg

Ffmpeg is free and provides libraries for handling multimedia data. It is a command line program used for transcoding multimedia files.

OpenCV OpenCV is a library of programming functions mainly aimed at real time

computer vision and is focused mainly on real-time image processing.

BGR2HSV

Adaptive Threshold

This is the simplest method of image segmentation. This technique is used to create a binary image in which there exist only black and white pixels. However, this method includes ways to adapt to dynamic lighting conditions.

Region of Interest (ROI)

The ROI is set around the fish to segment it, since it is only the fish that is of interest, not the entire image. The coordinates of the user’s click is used to set the ROI.

Contour Detection

The edge pixels are assembled into contours. The largest contour is detected and is the only contour that is used to represent the final shape of the fish.

HistogramThe Histogram represents the distribution of colour within an image.

Support Vector Machine (SVM)

A SVM is used to recognize patterns regarding the intensity of the pixels and is used to classify which class a certain pixel belongs to and makes its decision based on the data analysis. The ROI as well as the Histogram values are sent to the SVM for training and testing the system.

Table 1: System Objects and their descriptions

5.2 Relationships between objects

The figure below depicts the relationships between the objects:

5.3 Subsystems of HLD

5.4

Complete SubsystemThe figure below shows the high level design and its sub-components which include more detail about the subsystem.

Figure 3: Object Relations

Figure 4: Subsystems

Figure 5: Complete Subsystem

CHAPTER 6LOW LEVEL DESIGN (LLD)

In this section explicit detail of all data types and functions will be provided. Pseudo code will also be provided as well as all the aspects of the programming effort without resorting to actual code.

6.1 Low Level Description of ConceptsCLASS ATTRIBUTES

BGR2HSVcvCvtColor (bgrImg, hsvImg,

CV_BGR2HSV)

Adaptive Threshold

cvAdaptiveThreshold (hsvImg, hsvThresh,

CV_ADAPTIVE_THRESH_MEAN_C, CV_ADAPTIVE_THRESH_GAUSSIAN_C

, 139, 0)Region of Interest (ROI) cvResize (hsvThresh, threshROI)

Contour DetectioncvFindContours (threshROI, storage, &first_contour, sizeof (CvContour),

CV_RETR_CCOMP)

Draw Histogram DrawHistogram (histImg)

Table 2: Low Level view

6.2 Detailed MethodologyThis section will emphasise the methodology used to create this system by analysing the detail of each component.

6.2.1 Video feed

The figure below depicts how the video feed is captured. The consecutive frames make up the video which is displayed on the user’s monitor. This is illustrated below in Figure6.

Figure 6: Video feed

6.2.2 Processing starts once the user clicks on the fish.

6.2.3 BGR (Blue, Green, Red) to HSV (Hue, Saturation, Value)

Once a frame is captured from the video feed, it is converted from RGB to HSV. This is done because the HSV colour space is not as sensitive to dynamic lighting conditions as BGR.

Figure 7: User clicks on fish

Figure 8: Convert BGR to HSV

6.2.4 Adaptive Threshold

This method takes individual pixels and marks them as object pixels if their value is greater than some threshold value and as background pixels if they are less than some threshold value. The resulting image is a binary image which consists only of black and white pixels. The most important part is the selection of the threshold value. In this system the threshold value is manually selected using trial and error to observe which value removes the most noise. The Hue component of the HSV colour space is used and then an adaptive threshold is applied to the single-channel image.

Figure 9: Adaptive Threshold

6.2.5 Region of Interest

This method uses x and y coordinates to set borders around the object of interest (the fish). The larger image is then cropped in order to do further segmentation on a smaller image, in which only the fish is displayed.

6.2.6 Contour Detection and Flood Fill

Figure 10: Set ROI

This method computes contours it finds from a binary image. In this case the binary image is the threshold image in which the image edges are implicit as boundaries between positive and negative regions. The largest contour, which is the shape of the fish, is extracted to remove background noise and it is then filled with white pixels to represent the shape of the fish.

6.2.7 Histogram

The Histogram values are used to represent the colour distribution of the fish. The Figure below illustrates how dominant ‘orange’ is within the RGB image.

Figure 11: Contour Detection and Flood Fill

6.2.8 Sending the Shape and Colour representations to the Support Vector Machine (SVM) [7]

Since the shape and colour distribution of the fish is determined, this data is combined and sent to the SVM. Each fish is given a label (e.g. fish A has label 1 and fish B has label 2…, etc.) and the corresponding features (shape and colour) are combined for each label. The SVM then trains the system to recognize all the fish that is clicked on, each fish having its own unique set of features.

Figure 12: Draw Histogram

6.2.8.1 SVM Cross-validation and Grid-search [7]

The RBF kernel nonlinearly maps samples into a higher dimensional space so it, unlike the linear kernel, can handle the case when the relation between class labels and attributes is nonlinear. [4] The two parameters, C and γ, are the parameters used in the RBF kernel. Some form of model selection needs to be done in order to decide which C and γ are the best for a given problem. The aim is to choose a good C and γ in order to accurately predict testing data. In v-fold cross-validation, the training set is divided into v subsets of equal size. One subset is sequentially tested using the classifier which is trained on the remaining v-1 subsets. Therefore, each instance of the whole training set is predicted once so the cross-validation accuracy is the percentage of data which are correctly classified. This kind of cross-validation procedure can prevent the overfitting problem. The figure below illustrates

the overfitting problem whereby the classifier overfits the training data.

In contrast, the classifier shown in Figure14 below, does not overfit the training data and gives better cross-validation and as well as

testing accuracy.Figure 13: Better Classifier

A “grid-search” is recommended on C and γ using cross-validation. Different pairs of (C, γ) values are tried and only the pair with the best accuracy is chosen. In order to identify good parameters, exponentially growing sequences of C and γ are tried; for example, C = 2-5, 2-3, …, 215

and γ = 2-15, 2-13, …, 23

6.2.8.2 Training the System

The videos that are used in the system are captured at the aquarium. The camera is placed on a tripod in order to keep it stationary. Since the tanks within the aquarium are so large, it is

Figure 14: Overfitting problem

not easy to record a fish swimming at a constant distance from the camera all the time. Therefore, the only frames that are used in the system are those in which the fish appear at a reasonable distance from the camera and maintain this distance for at least three or four seconds. Since most fish appeared only once in the videos, at the desired distance, some of the training and testing videos are the same. This is acceptable since the duration of this project is only one year and though not impossible, capturing different training and testing videos is a complicated and tedious process. In order to have totally different training and testing sets, the tank should not be too large, because the shape and orientation of fish definitely changes as it is able to swim away from the camera. Each fish was trained with a total of about 40 training samples each. This includes both shape and colour data which is sent to the SVM. Each label in the SVM corresponds to a certain fish name, so when testing takes place, the SVM returns a

value (the label) that is stored in a file and the system reads that label and prints the desired output to make the classification.

6.2.9 System Classification

If the user clicks on a particular fish, its features (shape and colour distribution) are sent to the SVM. Since the system has been trained prior to testing it, the SVM allows the system to know more or less what each fish’s features ‘look’ like. The SVM will respond by giving the system a label; this label corresponds to a certain fish species and the corresponding classification output is displayed to the user. Figure14 below illustrates the classification process.

Figure 15: Send Shape and Colour Distribution to SVM

CHAPTER 7TESTING AND RESULTS

In order to correctly assess the accuracy of the system, each fish will be clicked on at least 10 times. This will amount to a total of 200 clicks, 10 clicks for 20 fish. The result of the test is represented in the graph below. The graph illustrates the accuracy of each individual fish, showing by what percentage each fish is classified correctly, which contributes to the

Figure 16: System Classification

overall accuracy of the system. The overall accuracy of the system amounts to 88%. This is a reasonable result taking into account the number of different types of fish that need to be classified.

Small

goldfish

Large

goldfish

Silver

goldfish

White/o

range

longhorn clo

wnfishoran

ge

Brownsp

otted fish

Bird w

rasse

(male

)

Slender

snipefish

Teard

rop

Threa

dfin

Eviley

e puffer

Semicir

cle Angle

fish

Pink chromis

Midus blen

ny

Cleaner

wrasse

Sea go

ldie

Convict S

urgeonfish

Racoon

Majesti

c angle

fish0%

10%20%30%40%50%60%70%80%90%

100%

Accuracy of Individual Fish

Fish Type

Perc

enta

ge

Figure 17: Individual Accuracy

In the above graph it is evident that an accuracy of 100% is achieved. This can be due to the fact that these particular fish have shapes that are very unique and shape features that are outstanding. There are also fish that have a low accuracy of 50% and less. This can be due to the fact that they take on shapes that are similar to

other fish. Nevertheless, the systems performance is good overall.

CHAPTER 8USER MANUAL

The demonstration mode of the GUI is illustrated in the figures that follow.

8.1 Starting the SystemA window will appear at start-up, verifying whether or not the user wants to start the system. The system will start if the user clicks ‘yes’ and will exit if the user clicks ‘no’.

8.2 Load VideoThe system is now requesting a video from the user. After a video is selected, the user will click on ‘Open’. The system will now use the selected video and display it on the screen. The user can now interact with the system by clicking on any fish within the video.

CHAPTER 9CODE DOCUMENTATION

The code has been fully documented whereby comments were inserted at each statement and each method. A description of all inputs and outputs will be given as well as caveats of all methods. The final source code will be stored on a CD and placed in an envelope.

ConclusionIn Chapter 2 a detailed description of the problem is stated as well as the software solution to the problem. The user requires an easy-to-use, interactive system. Since the system includes a GUI, it is simple and easy to use, because the user navigates through the system by mouse clicks instead of typing commands. The system is therefore also interactive. The problem stated is that the visitors at the aquarium do not have instant access to information of specific fish. The final system clearly meets this requirement by providing an easy-to-use interactive system which provides the user with instant information about specific fish. Such a system is also educational and attracts people because it is interactive.

Bibliography

[1]

Andrew Rova, G. M. (n.d.). Recognizing Fish in Underwater Video.

[2] Bridget Benson, J. C. (n.d.). Field Programmable Gate Array (FPGA) Based Fish Detection Using Haar Classifiers. California San Diego, USA.

[3] Chih-Wei Hsu, C.-C. C.-J. (2003). A Practical Guide to Support Vector Classification. Taipei, Taiwan.

[4] Haslum, P. (n.d.). Computer Vision.

[5] Kaehler, G. B. (2008). Learning OpenCV. USA.

[6] Rapp, C. S. (1996). Image Processing and Image Enhancement. Johnson City, Texas, USA.

[7] Zhou Hongbin, X. G. (n.d.). Real-time fish detection based on improved adaptive background. HangZhou ,Zhejiang Province,

China.