thai digit recognition on license plates using yolov3
TRANSCRIPT
Thai Digit Recognition on License Plates using YoloV3
by
Nadimpalli Lakshmi Manasa
A thesis submitted in partial fulfillment of the requirements for the
degree of Master of Engineering in
Microelectronics and Embedded Systems
Examination Committee: Dr. Mongkol Ekpanyapong (Chairperson)
Dr. Matthew N. Dailey
Dr. A.M.Harsha S. Abeykoon
Nationality:
India
Previous Degree: B.Tech Electronics and Communications Engineering
Jawaharlal Nehru Technological University,
Hyderabad
Telangana, India
Scholarship Donor: AIT Fellowship
Asian Institute of Technology
School of Engineering and Technology
Thailand
May 2019
ACKNOWLEDGEMENTS
I am grateful to my family for their love, support and motivating me.
I would take this opportunity to thank my advisor Dr. Mongkol Ekpanyapong for his support valuable
suggestions guidance and encouragement have helped me throughout to complete my thesis.
I would be grateful to thank Mr.Chatchai and Mr.Clifford for their assistance in assembling GPU and
CPU.I would like to thank the committee members, Dr. A.M. Harsha S.Abeykoon & Dr. Matthew N.
Dailey for their valuable comments and suggestions for my thesis work. I thank Mr.Vasan Timpong
and Mr.Teerapon for sharing the data taken from Thai military base.
N.L.Manasa
May 2019
ii
ABSTRACT
Identification of license plate, Thai numerals including logo is proposed using deep learning
method. This is done by collecting the Thai character numerals (0-9) from which a dataset is
created to identify them and is constructed to train convolutional neural network (CNN). CNN
has proved to achieve state-of-the-art results in such tasks such as optical character recognition,
generic objects recognition, and real-time face detection and pose estimation, speech
recognition, license plate recognition etc. Subsequently, this method involves detection and
recognition of all the Thai numeral characters (0-9), using CNN model. With the trained model,
character recognition system is designed and implemented. In this thesis we make use of object
detection algorithm YOLOV3 with the help of network architecture Darknet. The proposed
method includes recognition and character segmentation in a given license plate image. The
experimental results indicate that the proposed deep learning algorithm method is efficient in
detecting the characters.
Keywords: Thai Digit Recognition, License plate, deep learning, convolutional neural
networks, YOLOv3
iii
TABLE OF CONTENTS
PAGE
CHAPTER TITLE
Title page
i
Acknowledgements ii
Abstract iii
Table of Contents iv
List of Figures v
List of Tables vi
1 Introduction
1.1 Description 1
1.2 Problem statement 2
1.3 Objectives
1.4 Limitations and Scope
2
2
2 Literature Review
2.1 Background 3
2.2 License Plate Detection 4
2.3 Uses of license plate recognition 5
2.4 Universal OCR 7
2.5 Character Recognition 7
2.6 Thai License Plate 8
2.7 Convolutional neural networks 10
2.8 Yolo v3 11
3 Methodology
3.1 Data collection 13
3.2 Annotating Images 16
3.3 Training and Testing 18
4 Experimental results 23
4.1 Results 23
5 Conclusion and Recommendations
5.1 Conclusion 39
5.2 Recommendations 39
References 40
iv
LIST OF FIGURES
FIGURE TITLE PAGE
1.1 Image and Characters segmented 1
2.1 Thai License Plate 8
2.2 License Plate with 2 letters and 4 serial numbers 9
2.3 License Plate with number followed by 2 letters and 4 serial
numbers
9
2.4 Examples of Thai military license plate 9
2.5 Typical CNN architecture 10
2.6 Yolo v3 architecture 12
3.1 Data collection for Thai license plates from a video
3.2 Image labelled using Bbox-label tool 16
3.3 Image labelled using LabelImg 17
3.4 Yolo format of txt files 17
3.5 Intersection over union 18
3.6 Screenshot taken from terminal during training of Images 19
3.7 Representation of algorithm 20
3.7 Display text on Image 21
3.8 Display text on video frame 22
4.1 (a) Screenshot from prediction window for detecting single digit 24
4.1 (b) Screenshot from prediction window for single digit 24
4.2 (a) License plate detection from an Image 25
4.2 (b) Confidence of predicted image in terminal 25
4.3 (a) License plate detection from an Image 26
4.3 (b) Confidence of predicted image in terminal 26
4.4 (a) Detection of all digits in license plate 27
4.4 (b) Confidence of each digit observed in terminal 27
4.5 (a) Detection of all digits in license plate 28
4.5 (b) Miss prediction in detecting digits 28
4.6 Wrong prediction of license plate in an image 29
4.7 Detection tested on full car image 29
4.8 Screenshot of terminal window after prediction 30
4.9 Result tested under different light conditions 31
4.10 Missed predictions of digits 31
4.11 Confidence of predicted digits of an image from East Entrance gate
31
4.12 Results displaying whole license plate number on the image 32
4.13 Graphical representation of average loss vs number of iterations 35
v
LIST OF TABLES
TABLE TITLE
PAGE
2.1 Comparison for different approaches of license plate detection 6
2.2 Comparison of ALPR using deep learning techniques 6
4.1 Testing videos on new data and results obtained 33
4.2 Overall accuracy on videos tested on new data 34
4.3 Precision values of cropped license plate with Thai digits 36
4.4 Precision values of Thai digits along with alpha-numeric numbers 37
4.5 True positive, False positive, False negative comparison
38
1
1.1 Description
CHAPTER 1
INTRODUCTION
Automatic number-plate recognition is a technology that uses optical character recognition on
images to read vehicle registration plates to create and find out vehicle location data. It can use
existing closed-circuit television, road-rule enforcement cameras, or cameras specifically
designed for the task. ANPR is mainly used by police forces around the world for law
enforcement purposes, to check if a vehicle is registered or licensed. It is also used for electronic
toll collection on pay-per-use roads in highways and as a method of traffic movement, for
example by highways agencies.
Automatic number plate recognition can be used to store the data and images captured by the
cameras as well as the text from the license plate, and also can be configurable to capture and
store a photo of the vehicle’s driver. Existing Systems commonly use infrared lighting to allow
the camera to take the picture at any time of day or night. ANPR technology must take into
account plate variations from place to place.
The main concerns about these systems have focused on privacy fears of government tracking
citizens' movements, misidentification, high error rates, and increased government spending. For
Identification of characters in a number plate there are many algorithms involved in the software to
accurately determine and recognize the license plate. They include plate localization, plate
orientation and sizing, normalization, character segmentation and optical character recognition. The
complexity of each of these sections of the program determines the accuracy of the system. During
normalization, some systems use edge detection approach to increase the picture difference between
the letters and the plate.
Figure 1.1: Image and Characters segmented
(source: https://en.wikipedia.org/wiki/Automatic_number-plate_recognition)
Text recognition from natural scene image belongs to the field of pattern recognition which
comes under the application of image processing technique. Pattern recognition is termed as a
study of supervised learning and how a system or machine can observe, analyze the environment
and learn to distinguish pattern. Machine learning is used for detecting the pattern of various
objects. Machine learning is of two types. First one is Supervised learning and another one is
Unsupervised learning. Supervised learning can learn by its past experience or it can learn with
the help of teacher. Unsupervised learning can draw interfaces from dataset. The text recognition
from natural scene images taken from various datasets is the first important step in image
2
acquisition or taking images from different resources Pre-processing the image followed by
character segmentation and feature extraction. Pre-processing includes removal of noise or
resizing the image for proper segmentation. In given reference various method are described
for character segmentation. For achieving high recognition performance, the most important
thing is the selection of feature extraction method. Several feature extraction method are shown
in literature. The diagonal based feature learning method is used with the help of genetic
algorithm and provides accuracy in recognition. Template matching is also another method for
feature extraction.
1.2 Problem statement
In general, all the number plates contain alpha-numeric characters which are commonly used
in most of the countries. These alpha-numeric characters uniquely identify the vehicles within
the issuing regions database. But some license plates include the local language of their country
and these characters are difficult to recognize as they are in other form containing different
fonts and need to be trained by different algorithms to identify the characters accurately and get
the result. In the past, optical character recognition is used to perform this and now- a- days,
machine learning algorithms have developed to train any kind of data and recognize the
characters more efficiently compared to the existing ones.
1.3 Objectives
To recognize Thai numerals on a license plate in an image by creating a dataset of characters
in Thai language and training them using yolo method.
• Design a system which can detect Thai digits in a License plate image using YOLOV3.
• To collect large Data from Thai military base required to train the model. The dataset
includes different types of light conditions taken from all environments including
lowlight.
1.4 Limitations and Scope
Various researchers have developed several methods and techniques for the application of this
process. However, all the techniques have their own advantages and disadvantages. Moreover,
each country has its own system of numbering the LP, background, size, colors and language
of characters. Although some studies have been conducted on LP detection and recognition,
this is different from previous work studies because this is done by deep learning architecture
represented by a CNN model in both LP detection and recognition. Besides, to improve the
system, future work will focus on the accuracy rate of improvement in the detection and
recognition plaques with various constraints mentioned above. In addition, we can also try to
develop our real-time system using new technologies (smartphone, tablet, etc.) to exploit it in
a mobile environment.
3
CHAPTER 2
LITERATURE REVIEW
2.1 Background
Automatic License Plate Recognition (ALPR) method helps to recognize the number plate of a
vehicle in an efficient manner without the need for major human resources, human intervention
and this has become more important in the recent years. There are several reasons why need to
identification have increased. In the recent scenario, there are a growing number of cars on the
roads and all of them contain license plates. The quick development in digital image processing
technology has made it possible to identify and detect license plates at a faster rate. This whole
process may be done in less than 50 ms and gives 20 frames per second which is sufficient to
process real-time video streams.
Identification of vehicle number plates is useful for many different operators. For example, it is
used by government agencies to find cars and other vehicles that are involved in crime, check if
annual fees are paid or identify the owner of vehicle who violate the traffic rules. Many countries
like U.S., Japan, Germany, Italy, U.K and France have successfully applied and implemented
ALPR in their traffic management. Several private operators may also benefit from ALPR
systems. One such case is the inspiration behind the system developed in projects which relates
to a parking ticket payment system established by the Trondheim which is a localized company
WTW AS and uses major parking companies in Norway. The system allows the users to register
the license plate number of their cars either through a mobile application like various available
apps or through message sent along with the parking time they want to pay for. If the person
parking wants to check if a car has a valid parking ticket, he must manually enter the car license
plate number and search in a database. Each entry takes not more than a few seconds, but since
this is the main task of the person and must enter hundreds of vehicle data during a workday, this
becomes very difficult and burden. There is also an upper limit of how many cars a parking
attendant can check during a working day. Many systems are there to solve this problem and to
reduce the man work is to mount cameras on vehicles that can drive around the parking lot and
photo or film the parked car license plates. The ALPR system’s main purpose is to recognize the
license plate from the image or video stream, look it up in a database and see if the parking ticket
from the vehicle is valid. The requirement for such an ALPR system is high accuracy when
reading the license plates and also fast processing time.
The difficulty in recognizing the license plates from vehicles in the different test sets will impact
the accuracy of the system making direct comparisons of the accuracy and without considering
the complexity is meaningless. As pointed out by the authors, it is inappropriate to declare which
method gives the highest performance because of the lack of uniform ways to evaluate them.
A typical ALPR system can be split into two major stages:
1. License plate detection - detect the plate in the captured image.
2. Character/Digit segmentation - extract the alphanumeric characters from the plate
3. Digit recognition - recognize each individual Digit on the license plate.
Each stage has been implemented using various machine learning techniques. Traditional
machine learning techniques include features chosen by humans to represent the underlying
features of the image.
4
2.2 License Plate Detection
The license plate detection step greatly influences the accuracy of the next steps. The input is
an image containing none, one or multiple license plates and the output should be the portion
of the image containing the license plates only. A number of various methods and algorithms
have been proposed in the last years to solve this challenging problem. The accuracy has always
been an issue in this case and improved in the recent years but first locating a license plate in
captured images from a certain viewpoint considering different factors as occlusion and
illumination which refers to different light conditions affecting the process is still a challenge.
The brute-force method of processing each pixel present in an image gives a very high
processing time. As an alternative, the commonly used approach is to make use of notable
features in the license plate and only process the pixels which have these features reducing the
processing time considerably.
License plate recognition combined with optical character recognition is a combination of
integrated hardware and software that reads the license plates of a vehicle without the need of
human intervention. The main purpose is to identify and locate the vehicle properly. The main
aim is to replace the manual systems with an automated systems using license plates. Automatic
number plate recognition is a huge surveillance process which uses optical character
recognition on images to read the plates on different types of vehicles. They can use the closed-
circuit television or road- rule enforcement cameras, or ones specifically designed for
performing this task. They are used by various police forces as a method of toll collection on
pay-per-use roads and to monitor the traffic activities in huge cities. ANPR is used to store the
images that are being captured from the cameras as well as the text from the license plate. The
systems commonly use infrared lighting to allow the cameras to take the picture at any time of
the day. This mostly tends to be region specific due to plate variation from place to place.
License plate recognition (LPR) has received tremendous interest in past and recent years as a
challenging research topic. This is due to the fact that the conditions (e.g. light, color, dirt,
shadows, character sharpness, language etc.) and types of license plates are varied from place
to place. LPR has become an important part of many applications, for example road safety
enforcement, automatic parking lot control, automatic toll collection, speed limit enforcement,
and vehicle tracking and identification. LPR system may be installed as a part of traffic
monitoring systems working together with traffic light in order to identify the car that break
traffic rules or detect prohibited vehicles. In many cases, LPR system is useful for detecting
motorcycles with harmful behaviors, for example, riding against the traffic direction, riding
over speed limit, and not wearing helmet. A huge number of cars and vehicles have been used in
Thailand, where these activities are regularly found and the accidents occurring from these are
extremely high in number. These days Motorcycles have been widely used as household
vehicles and used by students as vehicles to go to school. Therefore, road safety enforcement is
now getting high attention as an important issue in order to reduce the possibility of accidence
from irresponsible motorists. Although many robust approaches have been employed by prior
research, deep learning approach has gain dramatic attention in recent day.
Till now, OCR tasks are solved by applying several steps which include text detection,
segmentation along with different pre-processing and character recognition as the final step of
the process which involves feature extractions and classification. Different text detection
methods depend on feature engineering techniques that are using: boundary features (boundary
of license plate for example), color features (color of the plate), specific color of the plate and
texture features (color transitions on the plate) or character features. There are various methods
of feature extractions for character recognition which depend on image representation forms
5
(grey level, binary, vector), such as template matching. There are modern approaches in
character recognition which involve using the most advanced techniques in deep learning. As
an example, deep convolutional neural networks (CNN) are used for multi-digit number
recognition task.
[C. Anagnostopoulos, I. Anagnostopoulos, V. Loumos, and E. Kayafas.] .A license plate-
recognition algorithm for intelligent transportation system applications,”. In this paper, a new
algorithm for vehicle license plate identification is proposed, on the basis of adaptive image
segmentation technique (sliding concentric windows) and connected component analysis with
a character recognition neural network. The algorithm was tested with natural-scene grey-level
vehicle images of different backgrounds and ambient illumination. The camera focused in the
plate, while the angle of view and the distance from the vehicle varied according to the
experimental setup.
[Kaushik Deba, Md. Ibrahim Khana, Anik Sahaa, and KangHyun Job, (2012)]. There is
segmentation technique used to sliding concentric windows (scw). In this we extract license
plate from natural properties by finding vertical and horizontal edges from vehicle region. On
the Basis of a novel adaptive image segmentation technique is for detecting candidate region
and Colour verification for candidate region on the basis of using hue and intensity in HSI
colour model verifying green and yellow LP and white LP, respectively. Basically they focused
on artificial neural network (AAN) new algorithm which is based on Korean number plate
system.
[M.I.Khalil, (2010)] .This LPR system consists of several modules: Image acquisition, licensed
plate extraction, segmentation & recognition of individual character. After the license plate
extraction phase, Information recognition phase (IPR) is applied. For this phase "moving
window technique" is used. For recognizing the image license plate, country name is load as
source image. Then the first image entry of country image set is loaded as an object. The
moving window technique is applied to detect that object within the image.
[J.M. Guo and Y.F. Liu]. “License plate localization and character segmentation with feedback
self-learning and hybrid binarization techniques,”. License plate localization (LPL) and
character segmentation (CS) play key roles in the license plate (LP) recognition system. In this
paper, they mentioned to these two issues. In LPL, histogram equalization is employed to solve
the low-contrast and dynamic-range problems; the texture properties, e.g., aspect ratio, and
color similarity are used to locate the LP; and the Hough transform is adopted to correct the
rotation problem.
2.3 Uses of license plate recognition
Check on Authenticity of Abandoned Vehicles: This method is used by the police forces
for law enforcement purposes in different scenarios. Here, the application can enable
them to take a quick look and check the authenticity of number plates installed on
abandoned or left out vehicles via mobile devices. Hence, this can curb illegal acts/anti-
social elements. A camera which is fixed additionally focuses on the car driver’s face,
capture and save the image for future security tests. In addition, this technology does not
need separate installations for each and every individual vehicle unlike other technologies
that install a transmitter for each vehicle.
Automation of Electronic Toll Collection: LPR can be used for automating the entry of
vehicles through toll barriers or toll gates. It combines with access control system to
recognize number plates listed in the toll collection database.
6
Table 2.1: Comparison for different approaches of license plate detection
Proposed method Methodology Results
SSD detector In this method, license plate is
detected by SSD detector and
then vertical projection
method is applied for
character segmentation.
Model: Lenet. Testing on
MNIST dataset-99.03%.
Custom data-73.27%.
CNN In this, Bangla character
recognition is performed by
training with method gradient-
based learning algorithm.
Large samples of Each
character data is collected and
feature are extracted.
Performance is evaluated by
using matlab based deep
learning framework.
Achieved 82.2% accuracy
when tested on different
samples.
Limitation was smaller
memory and computational
power
Brazilian license plate
detection using CNN model
Here, two CNN networks are
created. One is (FV/LPD-
NET) to detect car front view
and (LPS/CR-NET) to
recognize characters.
All the seven characters
detected correctly turns out to
be 63.18% and considering
partial matches, it is around
90.55%.
Smart lpr based on image
processing using neural
network
In this, Fusion technique is
used to extract the license
plate isolate the characters and
identify them using ANN.
The neural network is able to
recognize correct characters
with 95% with added noise.
Number plate recognition
without segmentation using
cnn
constructed a network which
includes convolutional
pooling and FC layers
Achieved an accuracy of
88.6% on a test set of 700
plate images.
Comparison: Different convolutional architectures used for license plate detection. Table
shows the time it takes to process an image and prediction accuracy score on the VOC2007
dataset.
Table 2.2 Comparison of ALPR using deep learning techniques
Architecture mAP FPS
R-CNN 66.0 0.05
Fast R-CNN 70.0 0.5
YOLO 63.4 45
YOLOv2 (288 x
288)
69.0 91
YOLOv2 (352 x
352)
73.7 81
Proposed method
YOLOv3
85 30
7
2.4 Universal OCR
There is no prominent trial on universal OCR. In most cases, a two-class classifier is
employed in the pre-processing step for separating input characters (or words) into
handwritten characters and machine-printed characters and then each character is fed
into an OCR module for handwritten characters or machine printed characters according
to its type. One of recent trials on this separation task is Zagoris et al. A possible reason
why universal OCR has not been tried is the difference between the distributions of
handwritten characters and machine-printed characters. Roughly speaking, handwritten
characters will have a rather anisotropic and wider Gaussian distribution (or Gaussian
mixture) whereas machine-printed characters more isotropic and narrower. This
difference might lead the use of different recognition techniques.
2.5 Character Recognition
The last step is to recognize each segmented character also known as optical character
recognition (OCR). This can be seen as an image classification problem with one class
per alpha-numeric character. In total there are 36 possible classes when analyzing most
western license plates, 26 letters and 10 digits. Existing methods can be split into two
categories: template-matching-based and learning-based approaches. A summary of the
advantages and disadvantages of each approach.
Learning-based approaches use machine learning techniques to discriminate characters
based on one or multiple features. Jiao et al. uses a neural network that learns to
discriminate based on image density. A number of CNN architectures also work well to
this approach. CNNs use multiple features and they are not required to define the
features in advance. In some cases, a pre-trained 9-layer CNN model sliding across the
bounding boxes is used.
Related Researches
There are many researchers who are trying to apply many new techniques to detect a car
license plate. The techniques and procedures that researchers used for license plate
detection are as follows:
1. Sobel edge detection technique combined with texture, moving windows to
detect license plate location. 2. Region-based technique to find a license plate location.
3. Gray level variation to detect a vehicle license plate.
4. Genetic Algorithm technique to find a car license plate.
All the license plate detection techniques mentioned above are the first step to identify
an exact location and a clear image of a license plate before recognition processes.
8
2.6 Thai License Plate
Department of Land Transportation of Thailand has imposed some rules, regulations
and specifications of Thai license plates. Each car must be classified by its purpose of
service. The plate is designed in order to distinguish and identify the service at a single
glance. This is done by the color of the printed characters and the background color of
a plate. For example, a private car has a plate with white background and black printed
characters. A public car such as a taxi has black characters printed on the yellow plate
while a special servicing car or a limousine has black characters on the green plate.
Figure 2.1 shows an example of a Thai license plate.
Figure 2.1: Thai License Plate
(source: https://www.beamng.com/resources/thailand-licence-plate-pack.2386/)
There are two rows of characters on the plate. They must be clearly printed at the center
of the row. The upper row is divided into two parts: the category on the left and the
running number on the right. The category part consists of one or two Thai consonants,
each from one of the 29 characters (out of all 44 consonants) or number between 0–9
including 0,1,2,3,4,5,6,7,8,9. The running number is made up from one to four digits,
each of which is a number between 0–9. The lower row shows the province in which the
car is registered, printed with smaller characters in the same color as those in the upper
row.
A normal Thai license plate is rectangular in shape and consists of 2 lines, the upper line and
lower line. The upper line is divided into 2 parts, the first consists of 2 characters that can be a
character and a number, and the second of which consists of 4 numbers. The upper line gives
identification of the car. The lower part shows the name of province in Thailand in which the car
license has been registered. The plate is 15 by 34 centimeters in size, with a colored and
embossed outline. The registration ID consists of two series letters followed by a serial number
9
up to four digits, from 1 to 9999, without leading zero’s, e.g. “กข 1” or “กข 1234”. A alpha-
numeric number may be added in front of the two characters if the letter pool has been
exhausted, as is the case in Bangkok since 2012, giving the format “1กข 1234”. Both license
plate styles are shown in Figure 2.2 and 2.3. Due to this case, the new plates, since 2012, have
reduced text size to keep the license plate size smaller. The province of registration is displayed
below the registration ID of number plate.
Figure 2.2: License Plate with 2 letters and 4 serial numbers
Figure 2.3: License Plate with a number followed by 2 letters and 4
serial numbers style
Thai military vehicles number plates: The Thai military base located in Chonburi,
Thailand has vehicles with different license plate and these are totally different from
normal Thai license plates. The vehicle include huge trucks to transport goods, cars, ambulance. So here are the samples of Thai military license plates.
Figure 2.4: Examples of Thai military license plates
10
2.7 Convolutional neural networks
A convolutional neural network termed as (CNN, or Conv Net) is a class of deep neural networks
which is most commonly used and applied for analyzing visual images. There are a number of
different types of CNN’s such as Googlenet, Resnet, Alexnet, VGG etc., Convolutional Neural
Networks(CNN) are made up of neurons that have learning weights. Each neuron receives some
inputs, then it performs a dot product and follows the next steps with a non-linearity. There are
basically two learning models while performing image processing tasks. They are named as
Supervised and Unsupervised learning. Supervised learning refers to learning through pre-
labelled inputs, which act as targets. The goal of this Supervised learning training is to reduce the
models overall error classification, through correct calculation of the output value of training
example. Unsupervised learning means it contains training set which does not contain any
labels.A convolutional neural network consists of an input and an output layer, as well as
multiple hidden layers inside it. The hidden layers consists of a CNN typically consist of
convolutional layers, RELU layer i.e. activation function, pooling layers, fully connected layers.
Each layer has a specific function and these are explained as follows.
Figure 2.5 : Typical CNN architecture
https://data-flair.training/blogs/cnn-tensorflow-cifar-10/
Convolution layer
The convolution layer is the main building block of a convolutional neural network. As we go
deeper to other convolution layers, the filters are doing dot products to the input of the
previous convolution layers. Convolution is the first layer to extract features from an input image.
Convolution preserves the relationship between pixels by learning image features using small
squares of input data. It is a mathematical operation that takes two inputs such as image matrix
and a filter or kernel.
Pooling layer
Pooling layers section would reduce the number of parameters when the images are too large.
Spatial pooling also called subsampling or down sampling which reduces the dimensionality of
each map but retains the important information. Spatial pooling can be of different types like
Max pooling sum pooling and average pooling. Max pooling and Average pooling takes the
11
largest element from the feature maps. The total sum of all the elements in a feature map is
known as sum pooling.
Fully connected layer
After multiple layers of convolution and padding, we would need the output in the form of a class.
The convolution and pooling layers would only be able to extract features and reduce the number
of parameters from the original images. However, to generate the final output we need to apply a
fully connected layer to generate an output equal to the number of classes we need. It becomes
tough to reach that number with just the convolution layers. Convolution layers generate 3D
activation maps while we just need the output as whether or not an image belongs to a particular
class.
2.8 YOLO V3
Yolov3 is an advanced model and latest over the existing yolo. Before this came into
picture, yolo9000 was the fastest and one of the most accurate algorithm among the
existing ones. The improvements made in yolov3 exceeds the previous existing methods.
Multi-label classification is made in this model and used. This makes use of a classifier
which is required to calculate the object detected in a specific label. Other one is the use of
various bounding boxes used for predicting the output. In yolov3, it assigns one bounding
box for each image. Yolo v3 predicts the boxes at 3 different scales and then it extracts the
features from those scales.
Yolov2 uses a deep architecture darknet-19, which is a 19-layer network with additionally
11 more layers included for object detection. However, this architecture is still lacking
behind the most important features that are now included and present in all algorithms.
Yolov3 is designed with all the new features lacking in the existing such as up-sampling,
having skip connections and residual blocks,
YOLO v3 uses a variant of Darknet, which originally has 53 layer network trained on
Imagenet. For the task of detection, 53 more layers are added and stacked onto it, giving us
a total of 106 layer fully convolutional underlying architecture for yolov3. The architecture
of YOLO now looks as follows.
12
Figure 2.6 Yolo v3 architecture
YOLO is a fully convolutional network and its eventual output is generated by applying a 1
x 1 kernel on a feature map. In YOLO v3, the detection is done by applying 1 x 1 detection
kernels on feature maps of three different sizes at three different places in the network. The
shape of the detection kernel is 1 x 1 x (B x (5 + C). Here B is the number of bounding
boxes a cell on the feature map can predict, “5” is for the 4 bounding box attributes and one
object confidence, and C is the number of classes. In YOLO v3 trained on COCO, B = 3 and
C = 80, so the kernel size is 1 x 1 x 255.
The first detection is made by the 82nd layer. For the first 81 layers, the image is down
sampled by the network, such that the 81st layer has a stride of 32. If we have an image of
416 x 416, the resultant feature map would be of size 13 x 13. One detection is made here
using the 1 x 1 detection kernel, giving us a detection feature map of 13 x 13 x 255. Then,
the feature map from layer 79 is subjected to a few convolutional layers before being up
sampled by 2x to dimensions of 26 x 26. This feature map is then depth concatenated with
the feature map from layer 61. Then the combined feature maps is again subjected a few 1
x 1 convolutional layers to fuse the features from the earlier layer (61). Then, the second
detection is made by the 94th layer, yielding a detection feature map of 26 x 26 x 255.
A similar procedure is followed again, where the feature map from layer 91 is subjected to few
convolutional layers before being depth concatenated with a feature map from layer 36. Like
before, a few 1 x 1 convolutional layers follow to fuse the information from the previous layer
(36). We make the final of the 3 at 106th layer, yielding feature map of size 52 x 52 x 25
13
CHAPTER-3
METHODOLOGY
3.1 Data collection
My work involved collecting images of Thai license plates from Thai military base using
surveillance camera which is placed in a position to detect the vehicle number plates at an
angle. From this we have a clear view of the license plates. The camera is accessed and
everyday data is send in the form of videos. The images are collected from two gates in the
military base. The images are collected in all the light conditions. We took snaps of all the
images obtained from the camera. We collected 600 images of Thai license plates and trained
them. Testing is done by collecting new videos from other days from Thai military base which
are not included in the training dataset.
The videos are taken from the East gate entrance, Main gate entrance and East gate exit of Thai
military base area. The camera is fixed in such a position so that only the license plates are seen clearly
as the vehicle moves towards it. Data is collected every day from 8 months and from each day there
are about 100-140 video of 3 minute duration each. Videos containing Thai digit number plates are
collected from them and each vehicle of such type is captured. Then it is labelled using LabelImg tool.
Here, from all the captured images, firstly only the license plates are cropped and further license plate
including the whole vehicle image is cropped.
Figure 3.1: Data collection for Thai license plates from a video
14
Images collected from East gate entrance and exit
Thai military base has many entrance and exit gates through which vehicles continuously move in and
out. The dataset is taken from all the possible gates and above are the pictures taken from East entrance
and exit gates. The vehicles coming through them contains Thai number plate cars which have Thai digits
printed on them along with numerals.
15
Images collected from main gate entrance and exit
The pictures above are taken from the Main gate entrance and exit from different positions with camera
placed at various angles. The data is captured from morning to evening time at different light conditions.
The cars moving from this gate also consists of Thai as well as normal numbers printed on them.
16
3.2 Annotating Images
We annotated the images using BBox-label tool in the first stage. But in this, we faced a
problem regarding labels as it was a bit difficult to convert them to yolo format. In BBox-label
tool, there is no option for labelling more than one class each time. So, as this takes more time
for labelling each class separately, we used LabelImg software which is an open source for our
work. Then annotated the images and all the files were saved in txt format. Here, we have totally
10 classes i.e. (0-9).With including the logo present on the plate, a total of 11 classes are present
on the license plate.
Figure 3.2: Image labelled using Bbox-label tool
17
Figure 3.3: Image labelled using LabelImg
Data Preparation
In yolov3, the values are fed into the system which has a specified format and the data is in txt
format which contains the classes and some values. These values look like below.
The order of yolo format txt files follows class, x, y, w, h
x = Absolute x / width of total image
y = Absolute y / height of total image
w = Absolute width / width of total image
h = Absolute height / height of total image
Where Absolute x, Absolute y, Absolute width, Absolute height are given below.
Absolute x = (Xmin + (Absolute width/2))
Absolute y = (Ymin + (Absolute height/2))
Absolute width = abs(Xmax - Xmin)
Absolute height = abs(Ymax - Ymin)
Figure 3.4 : Yolo format of txt files
18
3.3 Training and Testing
As we are using neural networks in our work, it requires a lot of computational power. Here,
we are using Intel core i7-7700k CPU with 4.2 GHz with Asus Prime Z270-A and Asus Rog-
SYRIX-GTX1080TI which has 16GB RAM for CPU and 11GB for GPU. This is well suited
to perform high end executions without any interventions.
In YOLOV3, there are filters present in the convolutional layers before the yolo layers. We
need to change the filters and is given by the formula
• Filters = (classes +5)*3
The values in the configuration file of YOLOv3 are set to batch number 64, subdivisions 16, learning
rate 0.001, saturation 1.5, exposure 1.5, hue 0.1, steps 400000, 450000, scales 0.1, 0.1 and anchors
of COCO dataset. The width and height are set to 416*416 to which the resolution of all the input
images were resized. And training is done, the number of iterations can be set at maximum batches.
When to stop the training depends on average loss function. This average loss function changes
gradually and decreases to some point where there is no further change for a few iterations and then it
can stop training. If the training is not stopped at that point, after some more iterations again the
average loss function increases and will start decreases after some more iterations and this process goes
on (Redmon,Mar 26, 2018).
Thresh is a minimum threshold of IOU(Intersection Over Union) considered during training.
IOU(Intersection over Union) is the fraction of the area of overlap to the area of union. Figure 3.5: Intersection over union
https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/
19
The resolution of the input image are resized to a width and height of 416. The learning
rate is set to 0.001, saturation and exposure values to 1.5.Then the training is started and
we can see this on the terminal window. In this method, there is no end point for the number
of iterations going on and we should know where the neural network will remain constant
while detection. This depends on the average loss function which is very high value at the
starting iterations and goes on decreasing as it moves to a high iteration number. We see
this on the terminal and when it is nearly zero and remains constant for next few epochs
then we can stop the training.
The weights are saved for every 100 iterations and this can be changed manually as
required. As soon as we stop training check the mean average precision (mAP) which
determines the accuracy of the model. To get this precisely, check for all the weights and
weigh with more precision is taken to detect the output. It is then tested on a test image.
Figure 3.6: Screenshot taken from terminal during training of Images
20
Flow chart for training and testing YOLOV3 model on custom data
Figure 3.7: Representation of algorithm
21
Flowchart for displaying text on Image
The function is modified to display detected classes text on the top of the image for increasing
the readability.
Figure 3.8: Display text on image
22
Flow chart for displaying text on video frames
Figure 3.9: Text display on video frame
23
CHAPTER 4
EXPERIMENTAL RESULTS
4.1 Results
After testing the images, we can see the output when an image is given. For every
weights file we check the precision and confidence percentage for each class and
check the performance of the system. From all the results, we can check and
clearly observe the system performance. The accuracy is good as far when tested.
For the license plates the accuracy is about 85 percent and for the digits it turns
out to be 83 percent. The videos have been collected for the whole day. But in
some cases it includes miss predictions.
Precision and recall are useful way to measure the quality of predictions. In pattern recognition,
precision which is also known as positive predicted value is given as the fraction of relevant
instances among all the given retrieved instances.
Precision = TP/TP+FP
TP: Case was positive and predicted positive
FP: Case was negative but predicted positive
For detection in video, we added alpha numeric plates along with Thai military license plates
because from the given video we have both vehicles moving through the gate continuously. So to
get better detection the images are trained also for normal digits for several iterations and then
tested on video. The video runs at 30-40 fps and saved result is stored in a file. The accuracy for
images of Thai military plates is 85% where as after adding alpha-numeric it has increased to
87%.There are also some miss predictions in all cases. Some license plates are not clear and the
digits are not clearly visible from all the license plates. Final testing is done by collecting new
videos which are not included in the training part. The output results from each testing case are
as shown below.
Figure 4.1(a) and 4.1(b) shows the output when trained for single digit detection. Figure 4.2(a),
4.3(a), 4.2(b), 4.3(b) shows the license plate detected in a car image and the confidence percentage
of detected license plate in the terminal. Figure 4.4(a), 4.5(a) shows the output of all the Thai
digits detected including the logo in cropped licence plate and 4.4(b) is the confidence of each
digit predicted. Figure 4.5(b) displays the wrong detections or the miss predictions taking place.
Figure 4.6 shows dual detections for a single license plate and Figure 4.7 is taken when tested
when we input a full car image and digits are detected.
24
Figure 4.1 (a): Screenshot from prediction window for detecting single digit
Figure 4.1 (b): Screenshot from prediction window for detecting single digit
25
Figure 4.2 (a): License plate detection from an Image
Figure 4.2 (b): Confidence of predicted image in terminal
26
Figure 4.3 (a): License plate detection from an image
Figure 4.3 (b): Confidence of predicted image in terminal
27
Figure 4.4 (a): Detection of all digits in license plate
Figure 4.4 (b): Confidence of each digit observed in terminal
28
Figure 4.5(a): Detection of all digits in license plate
Figure 4.5(b): Miss prediction in detecting digits
29
Figure 4.6: Wrong prediction of license plate in an image
Figure 4.7: Detection tested on full car image
30
Results of predictions from various gates and confidence score of detected digits
Figure 4.8: Screenshot of terminal window after prediction
These images are tested from the Thai vehicles coming from different gates and the
digits along with the license plate are detected and confidence percentage for each
digit is taken screenshot from the terminal window.
31
Figure 4.9: Result tested under different light conditions
Figure 4.10: Missed predictions of digits
Figure 4.11: Confidence of predicted digits of an image from East Entrance gate
32
Figure 4.12: Results displaying the whole license plate number on the image
The detected digits are all combined and to see the whole license plate number, code
is written to display the number detected by drawing a rectangular box and display
inside it using OpenCV drawing functions.
33
Table 4.1: Testing videos on new data and results obtained
Testin
g
videos
Numbe
r of cars
in video
Detecte
d license
plates
Miss
prediction
s
Test
video 1
8
All
license
plates
are
detected
-
Test
video 2
6
All
plates
detected
-
Test
video 3
12
10
license
plates
detected
Cars with
red license
plate.
Numbers
not
detected
Test
video 4
8
All
license
plates
detected
-
Test
video 5
9
4-5
license
plates
detected
Pole
obstructing
the camera
and sign
board
detected as
license
plate
Test
video 6
22
19 plates
detected
correctly
1 plate not
clear but
detecting
random
numbers, 1
plate not
clear
Test
video 7
5 All
digits in
lp
detected
correctly
-
Test
video 8
2 All
license
plates
detected
-
34
While testing in videos, each digit has its own confidence value and keeps changing
based on the detection. As the video keeps running continuously the license plate
number is being displayed on the left corner of the frame as long as car is seen in
the frame. There are total 12 classes. logo, license plate and digits from 0-9. The
testing is done on several videos in which Thai plate cars are present.
Table 4.2: Overall accuracy on videos tested on new data
Test
videos
Total
cars
Correct Incorrect Accuracy
1 8 8 - 100
2 6 6 - 100
3 12 10 2 83
4 22 21 1 95
5 9 7 2 77
6 8 8 - 100
7 5 5 - 100
8 3 3 - 100
9 5 4 1 80
10 4 3 1 75
11 7 7 - 100
12 7 6 1 85
13 3 3 - 100
14 7 6 1 83
Data is taken from another day from the military base for final testing
and each video duration in 3minutes. From this, the total number of cars
moving in the video are noted and license plate which are not detected are
taken as miss predictions. Each plate has a minimum of 4 digits for
normal plates and 6 for Thai plates. Each video accuracy is noted
manually based on true detections and the overall accuracy seems to be
around 91% for tested videos.
35
To display total number on the image, a rectangle box is first drawn using Opencv
drawing functions based upon the required dimensions. Then the data detected is to
be printed inside the rectangular box. For this the detected digits are stored in a string
and the labels in the string are used to print the detected output inside the rectangle
box. We can change the color of the rectangle box by giving required values to fill
the rectangle.
Figure 4.13: Graphical representation of average loss vs number of iterations for
license plate
The training and validation accuracy is calculated by taking down the values for every
1000 iterations and manually plotting them. The train accuracy is represented by blue and
validation is represented as red.
36
Table 4.3: Precision values of cropped license plate with Thai digits
Number of Iterations
Precision(%)
1000
90
2000
93
3000
93
4000
92
5000
90
6000
89
7000
89
8000
90
9000
89
10000
89
The license plates containing only Thai digits are cropped and each single digit is
labelled and trained to 13800 iterations and result is tested by giving test images
randomly and output precision for every 1000 iterations are noted.
37
Table 4.4: Precision values of Thai digits along with alpha-numeric numbers
Number of Iterations
Precision(%)
1000
49
2000
85
3000
86
4000
80
5000
85
6000
87
7000
87
8000
86
9000
85
10000
86
The vehicle images contains both Thai digit license plates and alpha-numeric
numbers. They are trained up to 15000 iterations and tested by giving a test image
and this weights are used while testing on a video since there are more normal cars
compared to the Thai military cars in video.
38
Table 4.5: True positive, False positive, False negative comparison table of
YOLOV3
Epoch’s TP FP FN
1000 384 273 227
1500 507 101 104
2000 544 76 67
2500 544 89 67
3000 545 69 66
3500 547 65 64
4000 526 77 85
4500 540 76 71
5000 548 69 63
5500 545 71 66
6000 550 62 61
6500 552 63 59
7000 549 65 62
7500 543 69 68
8000 549 70 62
8500 548 66 63
9000 546 73 65
9500 549 66 62
10000 548 56 63
The above is the table of confusion for all the digits predicted in an image and the values
are noted for every 1000 iterations. It includes true positives present in detecting the image
including the false positives and false negatives.
39
CHAPTER 5
Conclusion and Recommendations
5.1 Conclusion
Thai digit recognition is done by collecting data from different environments in real-time
scenario. It is performed and tested on images including the videos at various light conditions.
5.2 Recommendations
This can further recognize the characters along with the digits and can be extended to all the
languages to identify them.
40
REFERENCES
1. U. Yadav, S. Verma, D. K. Xaxa and C. Mahobiya, "A deep learning based character
recognition system from multimedia document," 2017 Innovations in Power and
Advanced Computing Technologies (i-PACT), Vellore, 2017.
2. Richard G. Casey and Eric Lecolinet , “A survey of methods and strategies in
character segmentation, “IEEE Transactions on Pattern Analysis and Machine
Intelligence,Vol.18,No.7, July 1966.
3. Anil. K. Jain ,and Torfin Taxt, “Feature extraction method for character recognition-
A Survey,” Pattern Recognition, vol. 29 ,no. 4, pp. 641-662, 1996.
4. Z. Selmi, M. Ben Halima and A. M. Alimi, "Deep Learning System for Automatic
License Plate Detection and Recognition," 2017 14th IAPR International Conference
on Document Analysis and Recognition (ICDAR), Kyoto, 2017.
5. C. Pornpanomchai and N. Anawatmongkon, "Thai License Plate Detection from a
Video Frame," 2009 WRI Global Congress on Intelligent Systems, Xiamen, 2009.
6. Shen-Zheng Wang and Hsi-Jian Lee, "Detection and recognition of license plate
characters with different appearances," Proceedings of the 2003 IEEE International
Conference on Intelligent Transportation Systems, 2003.
7. Phokharatkul, Pisit, and Chom Kimpan. ”Recognition of handprinted Thai characters
using the cavity features of character based on neural network.”Circuits and Systems,
1998.
8. S. Z. Masood, G. Shu, A. Dehghan, and E. G. Ortiz, “License plate detection and
recognition using deeply learned convolutional neural networks,” CoRR, vol.
abs/1703.07330, 2017.
9. S. Uchida, S. Ide, B. K. Iwana and A. Zhu, "A Further Step to Perfect Accuracy by
Training CNN with Larger Data," 2016 15th International Conference on Frontiers in
Handwriting Recognition (ICFHR), Shenzhen, 2016.