smart garbage classification system

1

Smart garbage classification system

Nguyen Ngoc Bao - Nguyen Sieu High School, Ha Noi - Viet Nam

E-mail: [email protected], Phone: +84986915640

Nguyen Ngoc Le Minh - Vinschool the Harmony High School, Ha Noi - Viet Nam

E-mail: [email protected] , Phone: +84967548724

Supervisor: Dr. Le Quang Thao - VNU, University of Science, Ha Noi - Viet Nam

E-mail: [email protected], Phone: +84.983.712.941

Abstract

Pollution is a huge problem that we must face in the 21st century and it is blocking

our path to achieve sustainable development. As the human population continues to rise,

more and more garbage is being produced. It does not only seriously affect the environment

but also damages human’s health, leading to salmonella, food poisoning, fever, etc.

The project aims to present the best solution to solve the waste management problem.

In the waste management category, sorting by type of garbage is one of the most essential

steps. By using a camera, convolutional neural network (CNN), a dataset of various training

and testing pictures, detection and classification can occur with the accuracy of 99%. The

robotic arm is added to grab the garbage and bring it to the pre-defined bin. All hardware is

communicated through a wireless network. This will help mankind to improve their health -

especially garbage collection workers, increase work efficiency, and replace human labor to

achieve sustainable development.

Keywords: garbage, detection and classification, wireless network, robotic arm,

machine learning, convolutional neural network

1. Introduction

Most of the waste humanity produced is came from either the industry, medical

purposed or just from everyday life. According to Blue environment report's, Australia

releases over 50 million of core waste into the environment in every year and the number has

been dramatically increasing till now. Another statistics is that hazardous waste composed of

6.3 tonnes, or 259 kg per capita of waste and almost 60% was buried [1]. The collection of

garbage is usually manpowered. Therefore, our aim is to automatically classify the type of

garbage - in our case are bottle, nylon and scrap paper. Increase in the efficiency of collection

is also one of the most vital target that should be accomplished. Automatic trash collection

gives us many advanatages such as another source of resources, reduce the cost of labour and

money to collect and reclassify trash, etc.

The population of the world is increasing dramatically. Therefore, more and more

garbages are being released into the environment. Likewise, the garbage that is being released

is usually non-biodegradable. This has lead to the importance of classification. As garbage

2

classification takes place, it will reduce the cost of making new materials, require much less

energy and preserve the environment.

Recognising and categorizing garbage is a long-lasting problem. In many countries,

people are encouraged to find a solution that will solve this problem. In developed countries

like America or England, artificial intelligence has been used for categorizing trashes. For

example, there is a solution at America that is well known. A company for environment had

been encouraged to create a robot that help the residences to categorizing garbage by using

sensors. The robot can senses the type of garbage, remove the trash, drop it into the correct

bin and process the trash directly in the bin [2]. However, the sensors that was used might not

be the best solution because the accuracy of the sensors are low and they can easily made

mistakes. In addition to that, there are many types of garbage and sensors are unable to sort

and recognise all of them.

Artificial intelligences (AI) is a computer that human programmed with the purposes

are for automation and learn the behaviours that similar to human beings. There are many

researches that shown artificial intelligences can substitutes humans to learn and do

dangerous tasks for the human bodies including categorizing trash. In this project, we use

machine learning method to train image datasets especially convolutional neural networks.

With the breakthroughs of artificial intelligence, there are many projects of sorting

garbage using AI. For instance, Bernando S.Costa has used artficial intelligence – especially

pre-trained architecture VGG-16, AlexNet and other advanced algorithm - to classify garbage

into different types [3]. Cenk Bircanoglu have designed a project named RecycleNet. This

projects inhibits the deep neural networks to detect and classify garbages. The deep neural

networks that this group use is DenseNet121 and this gave the accuracy up to 95% [4]. With

this project, however, the light, temperature and humidity might affect enormously to the

detection result. In addition to that, these projects cannot be used widely as they are very

expensive.

Machine learning algorithms can have immense advantages such as effient working

power or high accuracy detection. In this project, SSD (Single Shot Multibox Detector) will

be discussed further. The workflow of this model is divided into 2 main parts:

SSD inhibit the convolution network on the image only once and this computes a

activation map. Next, a 3x3 kernel is then run through the map to predict the bound box and

probability.

Anchor boxes are also used at a variety of ways and learns offset at a certain distance.

After this, a bounding box is drawn with a percentage of accuracy.

2. Models and experimental methodology

Machine learning

Machine learning is one of the arising fields that show the existance of the fourth

industrial revolution. Machine learning helps solving problems such as object detection,

3

speech recognition, etc. In addition to that, machine learning also includes smaller subjects

such as neural networks, unsupervised learning, supervised learning and reinforcement

learning [5].

Data preparation

The aim of this project is to detect and classify the type of garbage. To accomplish

this target, 500 training images and 100 testing images is incorporated. In this project, four

objects is used to train the SSD model represented in Table 1. The images of the dataset have

green background and picture of garbage (Fig 1).

Figure 1. Sample images of dataset collected in the laboratory

The garbage is oriented in different conditions and propotion. Each image has a

dimension of 440 x 330 pixels, the size of the training dataset is 13.6 MB and the size of the

testing dataset is 2.72 MB.

Table 1. Information of the dataset collected

Objects Quantities

Bottle 105

Nylon 98

Scrap Paper 89

Mixture 308

The reason why a training dataset and a testing dataset were used is because the

training dataset is used to produce the feature map and the testing dataset is incorperated to

provide an unbiased evaluation of the feature map that is extracted [6].

After the pictures have been captured, a labelling process is needed. Each image

contains one or more garbage. Therefore, garbage needs to be labelled to know what is the

location of the image. Each label is a bounding box of various dimension and each of the

label only represent one class. The number of classes is counted for training process to be

undergone. The labelling software that this project follows is LabelImg (Fig 2).

4

Figure 2. Labelling image process

After the labelling process is complete, a .xml file is generated. In this .xml file, the

starting coordinate (x1,y1) and the ending coordinate (x2,y2) is present. Height and width of

the bounding box is also included. This information is needed for the server to train the image

to extract the feature map. However, the .xml file must be converted into .csv which in turns,

converted to tensorflow record .tfrecord file for training.

Convolution Neural Network architecture (CNN)

Convolution neural network is one of the multilayer neural network that is used to

detect objects. In CNN, there are 2 part: a hidden layer consists of feature extraction and the

fully connected networks for classification. The architecture of this network is shown as in

Fig 3.

Figure 3. Convolutional Neural Network architecture

Firstly, the image is converted into a 3-dimentional (3D) array. Then, a kernel is

applied to convolunate the 3D array. The convolution is done by flipping the kernel and move

it before the 3D array. Next, the kernel slides through the array and the multiplication is done

5

follows by the addition method. After the convolution is done, the results are added and

extracted to the feature layer. There are many kernels and each kernel is responsible for one

detection category such as line, edge detection. Each of this will contribute to a higher

detection form -object detection. The pooling layer will decrease the dimention of the layer

and decrease the learning time significantly. Therefore, a max pooling layer is added in each

kernel, the largest value is chosen. A bounding box is drawn around the object.

The last part is the classification network. The image is converted into a single 1D

vector and this vector will be sent to the fully connected layers. These fully connected layers

will determine the accuracy of the bounding box.

SSD model

SSD is a detection model consists of MobileNet architecture and several convolution

neural network to detect objects. In this project, SSD is an objects detection model that starts

with a MobileNet - one of CNN architecture. This is called SSD-MobileNet [7] and the

architecture is shown in Fig 4.

Figure 4. SSD-MobileNet architecture

MobileNet is a base network architecture that is pre-trained to classify objects on

huge image datasets. This architecture will convert images and export position vectors into a

matrix. MobileNet has two CNNs – depthwise convolution and pointwise convolution.

Depthwise convolution is performed filtering though the image by a single CNN while

pointwise convolution is responsible for building new features by using input channel. After

that, the feature will be imported into SSD model.

TensorFlow pre-trained models

Our project used SSD-MobileNetV2 pre-trained models which are adapted from

TensorFlow. TensorFlow is an open source library was built by Google Brain Team to train

models of neural network. TensorFlow can be used in Graphic Processing Unit (GPU) or

mobile devices such as phones or tablet. Many projects have been built by TensorFlow such

as speech recognition, information steganography, computer vision, etc.

Loss functions

6

To evaluate our models of training, loss functions are used in this project. Loss

functions are functions that determine the error between the outputs of the algorithm and the

target values. The value of this function is directly proportional to the inaccuracy of the

model. This can be calculated by comparing the training dataset and the testing dataset. In all

algorithms and especially this project, loss function must be reduced to the minimum. In

standard conditions, loss function equals to zero.

In this project, cross-entropy classification loss is used. Cross-entropy loss is a

measure based on entropy, calculating difference between two probability distribution [8].

One important information to emphasize is that this loss function judges heavily on

probability that is confident but wrong. The cross-entropy classification loss value can be

determined by the formula below:

Cross-entropy(D,L)= �� log(��) (1)

In addition to the classification loss, regression loss is added and the type of

regression loss presented is smooth L1 [9]:

SmoothL1 (�) = �

�

��

|�| −�

�

� ��|�| < 1��ℎ�� (2)

4 Degree of Freedom robotic arm (4 DoF)

To replace human labor, a 4 DoF [10] is added to grab the garbage and put it into the

correct pre-defined bin. By using inverse kinematics, angles of movement can be determined

by using the coordinate of the bounding box after detection. After the angles have been

figured, the pulse of the motor is then calculated based on the angle value.

Robotics kinematics is the studies of motion of robotic without the help of

environmental forces. There are 2 types of kinematics, the forward kinematics and inverse

kinematics (Fig 5.)

Figure 5. Schematic diagram of inverse kinematics

Forward kinematics is often very easy and always have the solution for problem.

However, inverse kinematics is harder and require heavy computational mathematical

equations and complexity. 4 DoF schematic diagram can be represented as below:

7

Figure 6. Inverse kinematics of 4 DOF robotic arm model

To calculate the exact angle of rotation for the robotic arm to reach the object, a

system of equations is described as follow:

�� = 2(atan2�270a − √−�� − 2�� + 79668�� − �� + 79668�� − 11451456, a� + �� + 270� − 3384�)

�� = 2(��2�294� − √−�� − 2�� + 79668�� − �� + 79668�� − 11451456, �� + �� + 294� − 3384�)

where: a = (distance = -333.333 * (rect['x1'] + rect['x2']) / 2 + 396.666) - 85

b = (distance = -333.333 * (rect['x1'] + rect['x2']) / 2 + 396.666) – 80

rect[‘x1’] and rect[‘x2’] is the position of the garbage in image

With servo 3, this is the controlling servo for angle in axis z and servo 4 is for open

and close the arm. This part of robotic arm the authors will not mention because it is not the

main part in this project.

Message Queuing Telemetry Transport (MQTT)

MQTT is a lightweight and simple massaging protocol designed for constrained

devices [11]. MQTT protocol consists of 2 main parts is the server and the client. Client is

then divided into 2 parts which is the sending communication machine and information

feedback machine. In this project, the server is the 4 DoF robotic arm, and the sending

communication machine is the camera. Also, the information feedback machine is the 3 trash

bins. The reliability of MQTT is managed by 3 Quality of Service: level 0 is the massage is

sent almost one and no acknowledgement of reception is required, level 1 is the massage is

sent almost one and acknowledgement of reception is required, level 2 is a four-way

machanism is used for the delivery of massages only one [12].

Algorithm charts

8

a. diagram of training

b. diagram of testing

Figure 7. Algorithm diagram of system

In our project, we have 2 parts for recognising and sorting. The first part is when we

train the model by appling labelling images into the model (Fig 7a). The model will be

stopped when the loss function reaches 0.5. For the sorting process, the robotic arm will be

controlled to move and drop the garbage bottle, nylon or paper into the corresponding bin

(Fig 7b).

3. Results and discussions

This section shows the evaluation of this method for garbage recognition and the

analysis of loss function in detail. The results and discussions will be divided into parts:

classification loss function, smooth L1 loss in regression loss, manual control of 4 DoF, and

detection and classification results.

9

Cross-entropy in classification loss function

For the trash dataset, the reported error was high until the 1000 iteration, then started

to go down. However, at the start of the training, more and more feature of the images has

been extracted by simple filters and therefore the loss function decrease dramatically. As

more and more feature has been extracted, the object can then be identify and classifyand

because of that, the loss function decreases more slowly. The graph produces a trend

therefore it can be recognised as the machine had found a solution. The final result after over

90000 steps, the loss value is 1.15 (Fig 8).

Figure 8. Classification loss graph

Smooth L1 loss in localization loss function

Localization is a special topic of SSD because this model of recognition can recognise

large object accurately but this accuracy value decreases as the object gets smaller. As the

machine learns at a greater amount of steps, the value of localisation loss decreases. This is

because the predicting box at first mismatch a lot with the ground-truth bounding box

(labelling box) leads to low value of IoU. As more and more kernels have been used to

extract features, the object is detected more accurately and the localisation loss decreases

shown as in Fig 9.

10

Figure 9. Localization loss graph

Manual control of 4DOF robotic arm

In our project, we refer to the model EEZYbotARM_MK2 from EEZYrobots [13],

and we also modify the arm so that it is suitable for this project. The 4 DoF is created using

3D printing technology shown as in Fig 10.

a. Model 4 DoF

b. Printed 4 DoF

Figure 10. 4 DoF robot model

To control and calibrate the robotic arm, a program controlled by slider is created by

programming language Python. This slider uses QT Creator program installed in Raspberry

Pi board. For time period from 1ms – 2ms, we create a corresponding pulse controller from

500 - 2500 shown in Fig 11.

11

Figure 11. Manual control

Detection and classification

With the training models and testing models, SSD has been applied and the result of

training is shown as in Fig 12.

Figure 12. Detection and classification

The models can detect object up to 99% with 0.97 frame per second. All bounding

boxes have been drawn around the object. After testing the model, the model can recognize

garbage as the garbage has been put to the recognizing area. The final prototype of our

project was done and shown as in Fig 13.

12

Figure 13. Prototype of classify garbage system

4. Conclusion and future work

The recognition and classification of garbage based on convolution neural network for

quickly and accurately was proposed. In this project, by using SSD-MobileNetv2, the server

can identify and classify 3 type of garbage: bottle, nylon and scrap paper. All garbage has

been labelled and the position is returned to the 4 DoF. 4 DoF can pick up and return the

garbage to the correct trash bin. With SSD-MobileNetv2, the recognition and classification of

garbage have achieved a high level of accuracy but the current author desires to seek for a

better training methods.

The performence of the project can be improved by:

More and more models of training must be collected and investigated. This will

improve the performance of the classification.

The region of labelling images must be carefully labelled for more accurate training.

Because the model can recognise 3 types of garbage, more type can be added to

compare and evaluate the training model.

13

Reference

[1]. Blue Environment, “National Waste Report 2018”, Randell environmental consulting, 19

November 2018, Page 8, Category 2.1

[2]. Lori Ioannou, https://www.cnbc.com/2019/07/26/meet-the-robots-being-used-to-help-

solve-americas-recycling-crisis.html, accessed 27 July 2019

[3]. Bernando S.Costa, “Artificial Intelligence in Automated Sorting in Trash Recycling”,

[4]. Cenk Bircano, M. Atay, F.Beser “RecycleNet: Intelligent Waste Sorting Using Deep

Neural Networks”, July 2018.

[5]. Encyclopedia of Machine Learning (Page 3,4)

[6]. Frank R. Burden, Frank R. Burden, Richard G. Brereton and Peter T. Walsh, “Cross-

validatory Selection of Test and Validation Sets in Multivariate Calibration and Neural

Networks as Applied to Spectroscopy”

[7]. Cemil S., “Real-Time Diseases Detection of Grape and Grape Leaves using Faster R-

CNN and SSD MobileNet Architectures”, , April 2019

[8]. S. Panchapagesan, M. Sun, A. Khare, “Multi-task learning and Weighted Cross-entropy

for DNN-based Keyword Spotting.”, Causal Productions

[9]. Ibrahim Onaran, “Sparse spatial filter via a novel objective function minimization with

smooth l1 regularization”, Elsavier, 8 November 2012

[10]. Serdar Kucuk and Zafer Bingul “Robot Kinematics: Forward and Inverse Kinematics.”,

IntechOpen

[11]. MQTT.org, “Mq telemetry transport,” http://mqtt.org/, 2013, accessed 18/01/2020

[12]. D. Thangavel, X. Ma, A. Valera and H. Tan Sense, “Performance Evaluation of MQTT

and CoAP via a Common Middleware.”

[13]. Carlo Franciscone, http://www.eezyrobots.it/eba_mk2.html, accessed 18/01/2020.

smart garbage classification system

Documents