automatic road pavement crack detection using …...automatic road pavement crack detection using...

Automatic Road Pavement Crack Detection using SVM

Afonso Guerlixa Carvalhido Salvador Marques

Dissertation to obtain a Master Degree in

Electrical and Computer Engineering

Jury

President: Prof. Fernando Duarte Nunes

Superviser: Prof. Paulo Luis Serras Lobato Correia

Vogal: Prof. João Pedro Afonso Oliveira da Silva

October of 2012

Acknowledgment

I would like to thank my supervisor for all of his support that was fundamental for the

development and quality of this dissertation. I also would like to thank professor Henrique

Oliveira for all of his assistance in many situations occurred during the dissertation. I thank my

father and mother for giving me the opportunity to complete the master course in electrical

engineering and also my friends as well as my colleagues for always giving me the strength and

courage to proceed.

Abstract

To keep a high road surface quality and road safety, an appropriate maintenance policy needs

to be enforced, as soon as cracks start to appear. Since the traditional way of visually detecting

road cracks by a skilled technician is very time consuming, this dissertation presents an

automatic solution, therefore increasing the speed and efficiency of road surface pavement

analysis and reducing the technician effort and subjectivity of the achieved results. The

proposed system starts by pre-processing the database images, smoothing their texture and

enhancing any existing cracks, being followed by the extraction of descriptive features. Here

each image is divided into several non-overlapping blocks and each block originates a feature

vector. A supervised learning algorithm called support vector machine (SVM), is then used to

detect cracks. For this purpose the software LIBSVM, was used to train and test the system.

After labeling each testing set block as crack or non-crack, a post-processing technique is

applied to remove isolated crack blocks.

Two different databases are used for testing purposes, one that contains easier crack examples

and another with more challenging crack images, both databases being acquired with the

camera optical axis orthogonal to the pavement. Several experiments were made for each

database. In each experiment the classifier output is compared with the respective ground-truth,

segmented by an expert. The results achieved show high recall values for both databases and

also a high precision value for the first database, being capable of competing with the best

results reported in the literature.

Keywords: crack detection; support vector machine; features; image processing.

Resumo

Assim que as fendas aparecem, é necessário aplicar uma manutenção apropriada para manter

a qualidade e a segurança das estradas alta. Uma solução automática é proposta nesta

dissertação, pois o modo tradicional de detectar fendas por um técnico é muito demorado,

aumentando assim a rapidez e a eficiência da análise do pavimento rodoviário e reduzindo o

esforço feito pelo técnico tal como a subjectividade dos resultados. O sistema proposto começa

por pré-processar as imagens, suavizando a sua textura e enaltecendo as fendas existentes,

sendo seguido da extração de características descritivas. Nesta dissertação cada imagem é

dividida em blocos e cada bloco origina um vector de características. Depois, um algoritmo de

aprendizagem automática chamado support vector machine (SVM), é usado para detectar

potenciais fendas, através da biblioteca LIBSVM. Após classificar os blocos do conjunto de

teste como contendo ou não fendas, é usado um pós-procesamento para remover todos os

blocos que contêm fendas mas que se encontram isolados .

Nesta dissertação usaram-se duas bases de dados diferentes, uma delas contendo exemplos

de fendas mais fáceis e outra mais complicados, sendo ambas adquiridas de tal forma, que o

eixo óptico da câmara fica ortogonal ao pavimento. Várias experiências foram feitas para cada

base de dados, sendo que para cada experiência o resultado do sistema é comparado com o

respectivo ground-truth, fornecido pelo perito. Os resultados atingidos mostram um recall alto

para as duas base de dados e um precision alto para a primeira, estando ao nível dos melhores

resultados da literatura.

Palavras Chave: detecção de fendas; support vector machine; características; processamento

de imagem.

Contents 1 Introduction ........................................................................................................................... 1

1.1 Motivation ...................................................................................................................... 1

1.2 Purpose ......................................................................................................................... 3

1.3 Contributions ................................................................................................................. 5

1.4 Structure ........................................................................................................................ 5

2 State of the art ...................................................................................................................... 7

2.1 Introduction .................................................................................................................... 7

2.2 Pre-processing .............................................................................................................. 8

2.3 Feature extraction........................................................................................................ 11

2.3.1 Pixel-based .......................................................................................................... 12

2.3.2 Block-based ......................................................................................................... 13

2.4 Crack detection ............................................................................................................ 13

2.4.1 Pixel-based methods ........................................................................................... 14

2.4.2 Block-based methods .......................................................................................... 15

2.5 Crack classification ...................................................................................................... 21

3 Proposed System ............................................................................................................... 25

3.1 System architecture ..................................................................................................... 25

3.2 Databases ................................................................................................................... 25

3.3 Pre-processing ............................................................................................................ 27

3.4 Feature extraction........................................................................................................ 30

3.4.1 Features selected for road crack detection ......................................................... 31

3.4.2 Statistical properties of the selected features ..................................................... 32

3.5 Classifier ...................................................................................................................... 36

3.6 Post-processing ........................................................................................................... 38

4 System evaluation .............................................................................................................. 43

4.1 Test conditions - Road 1 ............................................................................................. 43

4.2 Test conditions - Road2 .............................................................................................. 43

4.3 Performance measures ............................................................................................... 45

4.4 Experimental results .................................................................................................... 46

5 Conclusions and Future Work ............................................................................................ 53

6 References ......................................................................................................................... 55

List of figures

Figure 1 – Example of ten different types of pavement surface made of asphalt and concrete

materials ........................................................................................................................................ 1

Figure 2 – Four images representing four different types of crack: (i) longitudinal crack (ii):

transversal crack, (iii) miscellaneous crack and (iv) alligator crack. ............................................. 2

Figure 3 – Example of an input image (top) and the respective ideal block based output

expected by the system (bottom) .................................................................................................. 4

Figure 4 – LRIS system ................................................................................................................ 7

Figure 5 – Complete architecture of an automatic crack detection system.................................. 8

Figure 6 – Example of an input image (top) and the pre-processed image with histogram

equalization (bottom) [13] ............................................................................................................ 10

Figure 7 – Thresholding example. The original image is shown on the left and the thresholding

output is represented on the right................................................................................................ 14

Figure 8 – Example of a block crack type. ................................................................................. 17

Figure 9 – SVM feature space example that selects the support vectors to separate the two

pattern classes through a hyperplane taken from [38]. .............................................................. 19

Figure 10 – 2D standard deviation for crack classification with an example of a longitudinal

crack represented by the point L1 taken from [6]. ....................................................................... 22

Figure 11 – Block diagram of the proposed crack detection system. ......................................... 25

Figure 12 – Example of two images of the two different databases considered ........................ 26

Figure 13 – Presentation of the several pre-processing configuration applied to the image

database, namely top-hat, mean filter followed by top-hat and min-filter followed by adaptive

histogram equalization (AHE). ..................................................................................................... 28

Figure 14 – Top- hat filter representation. All the image pixel intensities above api will have a

grey level equal to api ................................................................................................................. 29

Figure 15 – Mean histogram of the cracks (left) and non-cracks (right) of Road1. The vertical

axis represents the mean value and the horizontal axis the pixel value ..................................... 29

Figure 16 – Example of the three pre-processing combinations. Top-left is the original image,

top-right the top-hat image, bottom-left mean filter followed by top-hat and bottom-right min-filter

followed by adaptive histogram equalization ............................................................................... 30

Figure 17 – Histograms of the six features of crack (red) and non-crack blocks (green): a)

minimum value; b) mean intensity; c) variance; central moments of order d) 3, e) 4 and f) 5. The

vertical axis corresponds to the probability of each pattern to occur. The horizontal axis

corresponds to the feature value. Note that these histograms are normalized .......................... 33

Figure 18 – Scatter diagrams of six feature pairs for the first database a) mip and third order, b)

mip and fourth order, c) mip and fifth order, d) third and fourth order, e) third and fifth order, f)

fourth and fifth order. The green corresponds to non-crack blocks while the red represents the

crack blocks. The horizontal axis corresponds to one of the features used (the first in each

paragraph) and the vertical axis corresponds the other (the second). ........................................ 35

Figure 19 – Example that shows the improvement of the post-processing technique. The top

image corresponds to the ground-truth, the middle image to the classifier output and the bottom

image to the classifier output after post-processing. ................................................................... 39

Figure 20 – Example that eliminates crack blocks using a post-processing technique, leading to

worse results. The top image corresponds to the ground-truth, the middle image to the classifier

output and the bottom image to the classifier output after post-processing. .............................. 40

Figure 21 – Example of a doubtful case where the classifier detect blocks that have crack

evidence but not classified as such by the ground-truth. The top image corresponds to the

ground-truth, the middle image to the classifier output and the bottom image to the classifier

output after post-processing ........................................................................................................ 41

Figure 22 – Non-crack images examples that were rightly classified after pos-processing. The

right column images are the ground-truth (in this case it coincides with the original image) and

the left column images the classifier output. After post-processing the block classified as crack

in each image is eliminated. ........................................................................................................ 42

Figure 23 – Performance for different training set sizes (training images are randomly selected).

The horizontal axis represents the number of images of the training set. The vertical axis

represents the evaluation metric (recall for the first graph and f-measure for the second graph).

..................................................................................................................................................... 44

List of tables

Table 1 – Mutual information and correlation coefficient of the proposed features for the first

database ...................................................................................................................................... 34

Table 2 – Precision (top), recall (middle) and f-measure (bottom) of Road1 ............................. 46

Table 3 – Precision (top), recall (middle) and f-measure (bottom) of Road1 ............................. 47

Table 4 – Results of isolated features with and without pre-processing for the first database. . 50

Table 5 – Results of isolated features with pre-processing for the first database ...................... 51

Table 6 – Best joint recall (top) and best joint f-measure (bottom) achieved for the first

database. ..................................................................................................................................... 51

Table 7 – Comparison of the literature results with the results achieved in the developed system

..................................................................................................................................................... 54

List of abbreviations

BPNN – Back propagation neural network

GaMM – Gauss Markovian modeling

JAE – Junta autónoma das estradas

KNN – K-nearest neighbor

NN – Neural network

NSCT Non subsample contourlet transform

OAA – One against all

OAO – One against one

PDE – Partial differential equation

RBF – Radial basis function

SVM – Support vector machine

1

1 Introduction

1.1 Motivation

Roads have an important role in modern societies allowing a comfortable, fast and cheap way

to travel from one place to another. Roads can connect several different places not only inside

the cities, shortening their distance, but also between cities and villages as well as between

different countries, easing the mobility of people. They also have a strong impact on the

economic growth, due to the fact that roads promote tourism and supply a quick way for the

distribution and trading of goods. As a result, many vehicles use them every day, causing a

continuous degradation of the road pavement surface. If an appropriate maintenance policy is

not applied, the quality of the road pavement surface degrades, compromising road security.

Road pavement surfaces are often composed by asphalt, although there may be other types,

notably based on concrete materials [1]. Several distinct types of pavement can be identified,

considering their texture composition. Figure 1 (taken from [1]) illustrates 10 different types of

pavement, 7 of which are composed by asphalt and 3 by concrete materials.

Figure 1 – Example of ten different types of pavement surface made of asphalt and concrete

materials.

As observed in Figure 1, pavement surface images present different texture characteristics. For

instance, the granulation size and grey level can vary drastically from one pavement type to

another. Also the pavement granulation distribution can change. Pavements can also be

differentiated by their degree of striation [1].

2

There can be several types of distresses in road pavement surfaces. The first hint of

deterioration and the most common distress found are cracks [2]. A crack is a thin and long

road distress, characterized by its dark visual appearance. There exist several types of cracks,

with different severity levels. Longitudinal, transversal, miscellaneous and alligator are the main

crack types in road pavement surfaces, according to the former “Junta Autónoma das Estradas”

- JAE (see the Portuguese Distress Catalogue [3]). Examples of these types of cracks are

presented in Figure 2.

Figure 2 – Four images representing four different types of crack: (i) longitudinal crack (ii):

transversal crack, (iii) miscellaneous crack and (iv) alligator crack. The first three images belong

to the database used and the fourth was extracted from Google Image.

Whenever cracks start to show on road pavement surfaces, it is an indicator that the quality of

the pavement is degrading and maintenance is needed. By doing so, the road quality increases,

saving a considerably amount of money on restoration (comparing to a further distress

progression case). Besides cracks, others pavement distress types exist [4], but this dissertation

will be focused on cracks, since this is the most common road pavement distress and the first

type of road degradation to appear.

Several cameras can be placed in critical locations in order to constantly monitor the roads,

mostly in view of traffic surveillance, but that alone is not enough to supervise and keep the

road pavement quality high. Since cameras only cover a small part of the roads and may not

have enough resolution to allow the detection of cracks, an alternative is needed to gather

images and register the conditions of the pavement, typically involving a skilled technician

travelling along the road [5]. In case road surface images are captured, later on, the skilled

technicians have to analyze each of them and determine the existence of pavement distresses

and classify their type. This process is very time consuming, and requires a big effort to

manually analyze the full set of the acquired images [2]. Also it can happen that two inspectors

have different opinions when classifying the same distress [6].

To increase the speed and efficiency of road surface pavement analysis, several automatic

crack detectors have already been developed [7]. In the literature most of the automatic distress

detectors have the purpose to detect cracks and only a few to detect other types of distress,

3

since cracks are often the first distress to appear when roads start to degrade. So, if an

automatic solution is developed to detect pavement distresses such as cracks, as soon as they

appear, maintenance measures could be taken immediately, preventing a further degradation of

the pavement and keeping a higher road quality. Additionally, automatic solutions are less

expensive and more comfortable than traditional road pavement monitoring procedures and

would considerably reduce the effort required for analyzing the images manually.

All the existing automatic analysis techniques [8] present some limitations in detecting all cracks

for some images and, eventually they may detect a crack when there is none. A general

solution has not yet been found. However, several of the already developed techniques perform

well for specific types of road pavements [9]. The main drawbacks are due to a high

dependence on the road pavement texture and also on the image quality. Therefore

determining the pavement type is important to improve automatic crack detection results [1].

The weak representation of the signal (crack) to be detected, the weak contrast between the

pavement and the crack and the possibility that the texture of the road may hide the crack can

also hamper the task of detecting cracks. To minimize these limitations, automatic solutions

typically involve an image pre-processing stage. Pre-processing techniques aim to make the

image more uniform, without affecting the ability to identify the crack areas, and favoring the

contrast between the cracks and the pavement. By doing so, crack detection becomes easier

and faster.

Another critical issue to efficiently detect the presence of cracks is the set of features used to

describe the cracks in images, i.e., the crack properties chosen to help the search and detection

of cracks. The selection of features can be critical for the subsequent classification stage and,

therefore, for the overall system performance.

1.2 Purpose

This dissertation proposes an automatic system capable of detecting cracks from previously

capture road pavement images, with the intention of achieving a performance that can compete

with the best techniques reported in the literature. An objective evaluation methodology is also

adopted, providing quantitative evaluation results, thus easing the comparison against

alternative methods.

The proposed solution builds upon the techniques reported in the literature that produce the

best results. A solution focused on pattern recognition techniques is presented to detect and

classify cracks in images. Pattern recognition techniques typically involve the usage of

4

classifiers, which can operate in a supervised or unsupervised manner. The approach followed

in this dissertation is based on supervised learning, taking a set of selected road images

containing cracks as a training set, from which the system learns the crack characteristics. The

larger and richer the information contained in the training set, the more plausible is the system

to learn efficiently the cracks characteristics to identify them correctly later. To verify the system

performance in detecting cracks, after trained, a different set of road images, testing set, is

used. The system analyzes the testing set and marks the regions that correspond to cracks.

Later the system performance is determined by comparing the system results with the expected

ones, i.e., the so-called ground-truth information, which can be provided by manual image

analysis for a subset of the testing set. In this dissertation, support vector machines (SVM) are

adopted for the classification stage. In Figure 3 an example of an input image and the

respective ideal block based output expected by the system (ground-truth) is shown.

Figure 3 – Example of an input image (top) and the respective ideal block based output

expected by the system (bottom).

5

1.3 Contributions

The major contribution of this dissertation is the development of an automatic crack detection

system in road pavement surface images that achieves good results, when compared with the

literature results, using combinations of features commonly applied in the literature (mean and

variance), together with features not very frequently seen (third, fourth and fifth order moments)

as feature sets.

Since the pre-processing did not always improve the results, another contribution of this

dissertation is the use of specific pre-processing for specific features, i.e., the conjunction of

several different pre-processing with different features to train and test a system.

The best feature set tested provides a recall of 98.85%, a precision of 89.4% and a f-measure

of 93.09% for the first database considered, Road1, and a recall of 94.29%, a precision of

26.99% and a f-measure of 40.37% for the second database considered, Road2. In addition,

the best Road1 conjunction results .presents a recall of 99.04%, a precision of 94.09% and a f-

measure of 94.09%. In the literature Oliveira and Correia in [6] present a recall of 97% and a f-

measure of 94.7%. Lower but similar results are present in [10] with a recall of 96.3%, a

precision of 86.9% and a f-measure of 93.8%. A recall of 93.96%, a precision of 90.70% and a

f-measure of 91.95% is stated in [11]. A recall of 96.75% is achieved in the same paper. In [5]

the best recall was 95.44% for the first database and 85.44% for the second database. Although

other papers address this topic, the results are often of qualitative nature only, making those

results harder to compare. The output of such systems is often to classify each image as crack

or non-crack. Moreover, other papers distinguish several crack types not explicitly addressed in

this dissertation.

Analyzing the reported results, the best recall values for both databases used in this dissertation

can compete with the best recall results stated in the literature, while the best precision obtained

for the first database can also compete with the best precision results reported in the literature.

1.4 Structure

This dissertation has the following structure. Chapter 1 motivates the problem and describes the

main goal and structure of this dissertation. Chapter 2 presents the state of the art and the most

important techniques in the literature. A block diagram illustrating the main steps of an

automatic solution is presented, to clarify the type of techniques related to each of the automatic

crack detection stages. Chapter 3 describes the proposed system architecture. The pre-

processing techniques and the features selected that led to good results are first presented,

6

followed by a description of the training and testing stage of the SVM classifier. Finally the post-

processing used to improve the system performance is also described. In Chapter 4, the test

conditions for each database (Road1 and Road 2) are presented along with the results

achieved. Finally, Chapter 5 is reserved to not only compare the system results with other

similar techniques in the literature, extracting some conclusions but also to discuss possible

solutions (future work) that can improve the system overall accuracy.

7

2 State of the art

2.1 Introduction

Cracks are the first road deterioration sign and most common distress type found in road

pavement surfaces. They appear as thin, continuous and dark structures which can be visually

detected and distinguished from the road texture.

Crack detection plays an important role in the maintenance of road networks. During several

decades, pavement surface distress was monitored by visual inspection, representing a costly

and time consuming task [1] and [6]. To increase the speed and efficiency of road surface

pavement analysis, several automatic crack detectors have already been developed.

Advanced systems can acquire road images more rapidly, in a safer way and with better quality

than traditional manual annotation [12]. For that a camera is typically attached to the inspection

vehicle. Once the database is totally acquired the images can be analyzed offline by an

automatic crack detector. Figure 4, taken from [9] shows the LRIS system composed by two

high resolution linescan cameras together with high power lasers.

Figure 4 – LRIS system

The current Chapter does not focus on image acquisition but rather addresses previous works

on crack detection, emphasizing the main approaches reported in the literature, their strong

points and weaknesses. Figure 5 shows the general crack detection system architecture, where

represents an input image, the pre-processed image, the feature vector,

the label attributed (crack or non-crack) and the crack type assigned to eventually detected

cracks.

8

Pre-

processing

Feature

extraction

Crack

detection

Crack

classification

I

𝑛 J

{0,1}

Figure 5 – General architecture of an automatic crack detection system

The architecture presented in Figure 5 includes four main steps: i) pre-processing, ii) feature

extraction iii) crack detection and iv) crack classification. In the first stage, the input image is

filtered to remove noise and to enhance crack visual features. Then, the selected features are

extracted from the pre-processed images. Based on the computed feature values, each image

pixel (or each image pixel block) is classified as containing cracks or not, by the crack detection

algorithm. Finally, the detected cracks can be classified according to their geometric properties.

Some papers only care about detecting the cracks in each image, while others are solely

interested in evaluating the crack type. Only a limited number of papers in the literature

implement the whole architecture presented in Figure 5.

In the following sections the most relevant and interesting techniques presented in the literature,

that are involved in each of the considered stages, are described. In this dissertation the papers

that only distinguish the several crack types are discussed in the crack detection section. The

crack classification block is relevant only for solutions that use both crack detection, to detect

cracks, and crack classification, to label the type of each already detected crack.

2.2 Pre-processing

During the acquisition process, using a photographic or a video camera, the image often

becomes corrupted by random noise (e.g., camera noise) which hampers the detection of road

distresses. In addition, the illumination conditions may change for different image locations,

resulting in a road image that is not homogenous. Illumination changes between consecutive

images may also occur. Moreover, a road pavement surface image often shows a noticeable

non-uniform texture which can increase the difficulty in detecting road distresses. Shadows, tire

marks or oil stains can also interfere with the automatic crack detection procedure.

The role of the pre-processing step consists of removing, as much as possible, the noise and

smoothing the road texture, while keeping the ability to identify eventually existing cracks.

Depending on the pre-processing techniques selected, the overall crack detection results can

be considerably improved, speeding up also the image processing.

9

A wide variety of pre-processing methods have been used in the literature. For instance Oliveira

and Correia in [6] apply a normalization technique to reduce non-uniform background

illumination. For that purpose, a mean value matrix (where each element is the average value of

each image block) along with a preliminary classification of crack pixels based on their grey

level is applied, in order to equalize the average of the regions preliminarily labeled as non-

crack, maintaining the average intensity of the regions labeled as cracks. A region saturation

algorithm (top-hat) is also proposed in [6] to reduce the influence of white pixels that can lead to

standard deviation values similar to what is observed in blocks of pixels containing cracks,

therefore hampering the system performance when using such feature.

One of the methods discussed by Chambon and Moliard called morph in [13] presents pre-

processing techniques which include erosion, conditional median filter, conditional mean filter

and histogram equalization to reduce noise, producing a more uniform image and increasing the

contrast between cracks and road pavement. Median filters and grey-scale morphological filters

[14] can also be used to separate cracks from the background.

Other methods take into account the fact that cracks often correspond to abrupt changes of the

image intensity surface. For instance Gavilán et al applies in [1] a histogram technique, together

with a sliding window technique with a determined size, step and threshold to smooth the

texture, reducing noise and enhancing the crack features. Anisotropic diffusion filtering to

smooth the image texture variation is used by Oliveira and Correia in [10]. Besides smoothing,

this technique can also be applied for restoration purposes. A partial differential equation (PDE),

to smooth the image texture and enhance the cracks, is present in [15]. In [4] a PDE technique

is used for image segmentation.

A shadow-removal technique without affecting the crack pixels is presented in [16]. It consists of

four steps. The first one is a grey-scale morphological close operation for crack removal in the

image to ease the shadow area identification. Then a 2D Gaussian filter is applied to smooth

the texture and increasing also the shadow area identification. The third step consists on

creating N geodesic levels. Each geodesic level contains all the pixels between two grey-level

values in a way that every geodesic level has a similar number of pixels when compared to the

others. Then the first L low intensity levels will be part of the shadow region while the remaining

levels will be part of the non-shadow region. This value was empirically set by the authors. After

distinguish the shadow region from the non-shadow region, the last step consists on applying

the following equation to eliminate the shadow and get a more uniform image.

. if (i,j) S

if (i,j)

ij

ij

ij B

(1)

10

where

, i.e., the ratio between the intensity standard deviation of the non-shadowed

region B (DB), and the shadow region S (DS) respectively, and , where is the

average intensity of region B and the average intensity of region S.

Figure 6 – Example of an input image (top) and the pre-processed image with histogram

equalization (bottom) [13].

In Figure 6 an example of a pre-processed image with histogram equalization taken from [13] is

presented. The non-uniform illumination is considerably removed, producing a more uniform

image without affecting much the crack pixels.

The road pavement type, the image quality, the camera noise, the illumination and several

artifacts that can hamper crack detection may influence the pre-processing technique choice. In

particular, removing road pavement texture and correcting non-uniform background illumination

are two frequent pre-processing tasks which improve the crack detection performance. From the

techniques briefly presented above, mean and median filtering are two simple and fast

strategies that can be selected to make the image more uniform. However, depending on the

filter size selected, these techniques have the handicap of also erasing crack evidence. A more

complex technique capable of effectively reducing the intensity variance without hampering the

crack features of the database, used in [10], is anisotropic diffusion filtering. However, this

technique has the difficulty of selecting the right conduction coefficient and the number of

iterations. PDE techniques can detect any grey level transaction of adjacent pixels, being

capable of detecting cracks efficiently. However this technique is very sensitive to noise and

requires a technique to eliminate it first [15]. Histogram equalization is an interesting choice

since it has the property of removing non-uniform illumination without affecting the crack

features. Top-hat is another interesting operation since it is of simple use and it produces a

more uniform image with the advantage of not hampering the crack pixels, while smoothing the

11

texture of the image. Despite anisotropic diffusion and PDE approaches being effective, their

implementation is somewhat complex and these are computationally time consuming

techniques. Therefore mean and median filters as well as top-hat and histogram equalization

are interesting pre-processed techniques to explore due to their simplicity, speed and efficiency.

2.3 Feature extraction

After the pre-processing block, the next step in the proposed system architecture (Figure 5) is

feature extraction. Depending on the features quality, i.e. the ability to distinguish crack features

from non-crack features, the overall system performance can change drastically, thus requiring

a special attention.

The pre-processing block can contribute to improve the feature quality since it can enhance the

cracks from the background. For instance, if all the crack pixels are represented by low grey

level while the non-crack pixels are represented by a much higher intensity after pre-processing,

the distinction of both classes can be easier, thus increasing the feature quality and affecting

the system performance positively. The capacity of removing noise can also improve the feature

quality. Besides pre-processing, the feature selection can significantly contribute to the feature

quality extracted as well.

Just like the pre-processing techniques, there are several features that can be chosen. Some

interesting features and those most commonly reported for crack detection are described in this

section.

Cracks have a number of properties that can be exploited to discriminate them from non-crack

features. Notably, they have photometric characteristics (dark pixels), as well as geometric

(elongated continuous structures) and frequency properties (they correspond to sudden

transitions in the image, thus being associated with high frequencies), that can be explored by

crack detection algorithms.

To exploit the above properties, two main approaches to extract features, and therefore to

detect cracks can be distinguished in the literature: i) pixel-based methods and ii) block-based

methods. The first approach aims to segment the image into two sets: the foreground (cracks)

and the background (non-cracks), by classifying each image pixel based on its properties (e.g.,

intensity). The second approach splits the image into a set of (often non-overlapping) blocks

with the purpose of extracting features from each block. A supervised learning algorithm can

then be trained (e.g., a neural network) to discriminate crack from non-crack blocks. For both

approaches, a description of the most common type of features applied in each approach is

presented in the sequel.

12

2.3.1 Pixel-based

The pixel-based methods focus on several road surface crack properties, e.g. photometric and

geometric properties. The technique reported in [7] uses photometric properties such as the

pixel grey level as crack features. Tanaka and Uematsu, in [17], use first the photometric

property pixel grey level and then, in a pixel neighborhood, the photometric properties mean

grey level and geometric property local variance. Chambon and Moliard use photometric

features like grey level pixel and geometric characteristics such as length and width of the crack

in the method morph introduced in [13]. Statistical features like mean and standard deviation

are used by Cheng et al in [18] and by Nguyen et al in [8].

Other papers use as features, a set of crack frequency properties. For instance in [19] a Sobel

edge detector is used. Wavelet coefficients can be computed for a given scale or for several

scales being subsequently merged. Subirat et al in [20] apply these two procedures for a

continuous 1D wavelet transform. A 2D continuous wavelet transform with several scales is

used in [21]. Chambon and Moliard in [13] present a second technique called GaMM that uses

wavelet coefficients as features. Contourlet coefficients are used in [22]. In [23] a non

subsample contourlet transform (NSCT) is used for image decomposition to extract the

coefficient in different scales and different directions. A novel segmentation technique that

typically operates at a specific frequency and orientation is presented in [24] based on Gabor

filters.

Generally, photometric and geometric features are capable of detecting part of the crack region

but also a lot of unwanted noise, as observed in [7] and [17]. However, these features can

achieve better results like presented in [13] using the morph technique. More complex but

effective feature is the wavelet coefficient. For instance the techniques GaMM and morph are

compared in [13], where morph achieves more true positives but GAMM achieves less false

positives (less non crack pixels detect as crack pixels). Other papers using wavelets, such as

[20] and [21] can detect the crack regions well, with the handicap of presenting also some

unwanted noise. Mean and standard deviation also proved to be promising features. The

method presented in [8] (using mean and standard deviation as features) is compared to

Subirats et al method in [25], being capable of detecting only the crack pixels, while Subirats

detects also some noise.

13

2.3.2 Block-based

Block-based methods split the image into squared blocks, extracting a set of features from each

block. Several features have been adopted for this purpose.

Mean and standard deviation of the image intensity in each block are two features that can be

used simultaneously. These two features are applied by Oliveira and Correia in [6] as crack

features, building a two dimensional feature space where each point corresponds to the mean

value and standard deviation of each block. A binary classifier labels each block as containing

cracks or not in [26], using a density feature, followed by a proximity feature and a fractal

dimension feature. A grid cell of 8x8 pixels is used in [27], each cell being labeled as containing

a crack, or not, according to the grey level of the border pixels. Rosa and Correia applied in [5]

the features dynamic range, minimum intensity pixel and standard deviation. Average value and

minimum value intensity are the features used in [28]. In [29] each block is labeled as crack or

non-crack block, depending on the crack pixels percentage.

Mean and standard deviation are two commonly features used in block-based approach,

leading to good results. In [6] the best results are achieved for a parametric learning algorithm

with a f-measure of 94.7% while the best f-measure stated in [11] is 91.95%. Some papers

using these features in the pixel based approach, also state good results. For instance, 97.5%

of the images containing cracks are classified as such in [8] and good results are reported in

[18]. In [5] the best recall presented using the minimum intensity pixel is 95.44% and 95.02%

using standard deviation together with the minimum intensity pixel. These three features (mean,

standard deviation and minimum intensity pixel) show good results and produce one of the

highest results achieved in the literature, being a good starting point when developing a new

system.

2.4 Crack detection

The crack detection block uses the extracted features in order to detect cracks in road surface

images. Depending on their quality, different results can be achieved.

The current section, just like section 2.3, is divided into two subsections. One presenting the

approaches typically used in the pixel-based methods and another presenting the approaches

used in the block-based methods. Both approaches are described in the sequel.

14

2.4.1 Pixel-based methods

In the pixel-based approach, images are analyzed by methods that make crack detection

decisions for each individual image pixel. Most of the papers that follow this approach are based

on a pixel comparison of the image feature with a threshold, which is typically followed by some

kind of post-processing that enforces space continuity.

The simplest classification method using the pixel-based approach is thresholding. It is often

used for deciding which pixels in an image correspond to cracks [13] using the pixel intensity as

feature. Since crack pixels are often the darkest ones, all image pixels can be compared with,

e.g. a pre-defined threshold. Every image pixel that has its intensity lower than the threshold is

considered a crack pixel and receives the corresponding classification label, e.g., 1 (crack) and

0 (non-crack) otherwise. The thresholding operation can be stated as follows:

1 if l(x) < T

( )0 otherwise

L x

(2)

Where denotes the threshold, I(x) the feature value at position x and L(x) the binary label

assigned by the classification algorithm.

In Figure 7 an application of this method is illustrated. The original image is shown on the left

and the thresholding output is represented on the right.

Figure 7 – Thresholding example. The original image is shown on the left and the thresholding

output is represented on the right.

As observed from Figure 7, thresholding has two major drawbacks. First, it leads to many false

positives i.e., pixels that are marked as belonging to cracks when they are not. This problem

can be partially alleviated using post-processing techniques, such as morphological operators

[14] capable of eliminating the most obvious errors, e.g., isolated pixels. Morphological tools

and top-hat operations are also used after thresholding the image for the same purpose in [17].

15

In [20] a noise removal post-processing based on morphological tools is also applied. A more

sophisticated way to enforce space continuity is based on the use of Markov random fields

presented by Chambon and Moliard in [13]. This model tries to connect local crack regions with

their respective neighbors, based on the comparison of their orientation and distance.

The second difficulty concerns the choice of the threshold. Classic automatic threshold

detection strategies, such as the Otsu method [30], do not perform well on road surface images

because the image histogram often has a single mode and it is not possible to find a meaningful

threshold separating the histogram into two modes. The influence of the crack pixels in the

histogram is negligible because they are considerably less compared to the non-crack pixels.

Other algorithms have been proposed with emphasis on the use of adaptive local histograms in

which the crack pixels may have larger influence [31].

Another problem with these techniques when thresholding the image is the fact that the spatial

organization of cracks is not often considered. The lack of capacity in dissociating crack from

others artifacts that have also a low grey level, like shadows, tire marks or oil stains is also

another limitation that can be appointed.

2.4.2 Block-based methods

The block-based approach consists of dividing the image into a set of typically non-overlapping

blocks. Since the classifiers used in this method typically employ supervised learning

techniques, a ground-truth containing a set of images segmented by an expert, where each

block receives a binary label (crack or non-crack) is assumed to be known. Then, the classifier

is trained using part of the features (e.g., statistical properties) extracted, from the blocks, and

tested afterwards with the remaining part of the features to predict the expert labels, to measure

the system performance with the help of the ground-truth.

Several classifiers have been reported in the literature to perform such a task, notably: i) neural

networks (NN) [32], ii) K-Nearest Neighbor (KNN) classifiers [6], iii) Adaboost [5] or iv) Support

Vector Machines (SVM) [26]. The following sections will briefly describe each of the four

techniques with a special attention on SVM.

2.4.2.1 Neural Networks

Neural Networks (NN) is one of the machine learning techniques used in the literature. The

relationship between the input and the output is typically non-linear and depends on a large

number of coefficients (weights) which must be learned from the data. Back propagation neural

network (BPNN), a feed-forward multi-layer network [33], is usually composed of three layers.

16

Each layer can be composed of several nodes. The first layer usually represents the input layer

and has as many nodes as the number of features being used for crack detection. The second

layer is a hidden layer and the third layer is the output layer, typically representing the class

attributed to the input. Another characteristic of the BPNN is the ability of the system output to

become closer to the desired output, through an adequate weight adjustment. Despite the ability

to classify correctly noisy data, being the NN major advantage [29], this technique also presents

some limitations, namely, slow speed of convergence [26] during the learning phase and the

need of many good samples to train the system properly [29].

Li et al in [26], compare two learning algorithms, namely BPNN and SVM, to label each image

block (40x40 pixels) with one of five possible crack types, respectively longitudinal crack,

transversal crack, alligator crack, block crack or no crack. The training set is composed of 450

images (90 images of each type are considered) and the testing set of 305 images (90

longitudinal cracks, 55 transversal cracks, 70 alligator cracks, 50 block crack and 40 no crack).

The SVM parameters are computed through a genetic algorithm while the BPNN is compose of

15 nodes in the hidden layer with a learning rate of 0.01 The final results have shown that SVM

was more accurate than BPNN for all the training sizes considered (100, 200, 300 and 450

images) and were always faster than BPNN as well. The best results of the SVM and BPNN

were for the training set of 450 images, achieving a classification rate of the correct crack type

of 78.4% and 69.6%, respectively.

Three neural network techniques (Image-based Neural Network, Histogram-based Neural

Network, and Proximity-based Neural Network) are used to detect four crack types namely

longitudinal, transversal, alligator and block cracks in [29]. The best result was produced by

Proximity-based Neural Network with a recall of 95.2%. For each of the three techniques a

40x40 block size was used. A database of 450 artificial images (90 images of each crack type

and 90 with no cracks) was generated. All the neural networks were trained with 300 of artificial

images and tested with 124 actual pavement images and 150 artificial images. Several hidden

nodes (30, 60, 90, 120 and 150), learning coefficients (0.1, 0.05, 0.01 and 0.001) and training

epochs (500, 1000, 1500 and 2500) were explored to find an optimal architecture. For all the

three neural networks used the best results were achieved for 60 hidden layers with a learning

coefficient of 0.01 for 1500 epochs.

Block crack is a new crack type introduced in this dissertation. Despite being similar to the crack

type alligator, this new type is presented in [29] as a development of the transverse crack,

showing some rectangular patterns. In Figure 8 an example extracted from the Google image

section is presented.

17

Figure 8 – Example of a block crack type.

2.4.2.2 K-Nearest Neighbor Classifier

The K-nearest neighbor classifier is a non-parametric machine learning algorithm [6] which

labels each test sample, taking into account the class of the closest training samples. The most

voted class among the classes of the k nearest neighbor dictates the class attributed to the test

sample [33]. The purpose of the training set is to supply not only labeled samples with the

respective class (e.g. using the ground-truth) but also to find the k neighbors [34] (typically using

a small value for k) that supplies the best system accuracy rate [33]. To minimize the influence

of the neighbors that belong to the most frequent class, a possibility is weighting them according

to their distance. The further they are from the test sample, the less relevant is their contribution

to determine the pixel class. That way the class that has a larger number does not impose itself

due to its size.

Oliveira and Correia apply a 1-KNN (one nearest neighbor) in [6] for crack detection together

with an estimated posterior probability density functions, achieving a recall of 94.6%. The

database used has images with several crack types namely longitudinal, transversal and

miscellaneous but also images without cracks. Two different resolutions are referred, namely

2048x1536 and 1858x1384, being the block size chosen 75x75 pixels. Three pattern

recognition algorithms are compared in [33], namely, KNN, artificial neural networks (ANN) and

SVM for eggshell crack detection. The features used for the three techniques were the same,

selecting the internal parameter set for each technique that leads to the best performance. The

results showed that SVM detected 97.1% of the correct identification rate (attribute the right

label to each egg), 92.1% for the neural network and 88.9% for the KNN.

18

2.4.2.3 Adaboost

Adaboost is a learning algorithm based on boosting, capable of building a strong classifier with

high accuracy rate through the combination of several weak classifiers [35]. The weak

classifiers are iteratively trained and the best weak classifier (the one who produces the best

results) is selected in each iteration. Then, the weight of the mislearned data is increased while

the weight of the data well classified is decreased [5]. That way if a weak classifier classifies

well the mislearned data in the previous iteration, will have a higher weight than the other

classifiers, being chosen to be part of the final classifier, and leading the final classifier to a

strong and versatile one, as stated in [36].

Three kinds of Adaboost classifiers are experimented in [5], namely, Modest Adaboost, Gentle

Adaboost and Real Adaboost, being the Modest Adaboost the algorithm chosen since it

converges faster and provides a better system overall performance than the other two. The

number of iterations for each training set was 100, since it can achieve the minimum error. The

images of the two different databases used consist of grey scale with low texture variation,

being divided each image in blocks of 64x64 pixels. The best result achieved for the first

database was a recall of 95.44% with a crack type classification (longitudinal, transversal and

miscellaneous) 100% correct, for a training set of 25% of the respective database images. The

best results of the second database, that has harder images to analyze, achieved a recall of

85.44% with a crack classification correction of 100% also for a training set of 25% of the

respective database images.

2.4.2.4 Support Vector Machines

This subsection addresses support vector machines (SVM), the classifier used in this

dissertation for the detection of road surface cracks. An SVM classifier, just like other learning

algorithms, is composed by training and testing stages [2]. In the training stage the selected

features are extracted and typically mapped into a higher dimensional space in order to

efficiently separate crack features from non-crack features. Since the ground-truth of the

training set is supplied, the features that correspond to cracks and to non-cracks can be

determined. Then, as illustrated in Figure 9, SVM selects the set of points in each class (support

vectors) that are the nearest to the other class and through them computes a hyperplane that

separates the two classes, being as far as possible from the support vectors. This hyperplane is

often called maximum-margin hyperplane and makes SVM robust.

Once the system has been trained, the following phase is the testing stage. In this stage each

testing sample is classified as belonging to one of the two pattern classes. For that the testing

set features are mapped into the same dimensional space produced in the training stage and,

19

according to the hyperplane side they fall, the corresponding pattern class is attributed. Finally,

the classifier accuracy can be evaluated by comparison against a set of manually labeled data.

Figure 9 – SVM feature space example that selects the support vectors to separate the two

pattern classes through a hyperplane taken from [37].

The example illustrated in Figure 9 is very simple and there is no need to map the extracted

features into a higher dimensional space since they can be easily separated by a hyperplane.

However the typical case is much more complex and the two classes are often mixed, being

necessary to map the features to separate better the two classes.

Note that in Figure 9 several different hyperplanes could separate the two classes. However the

hyperplane computed, tries to be as far as possible from the support vectors.

The use of an SVM classifier involves solving an optimization problem [26] and to optimize a

regularization parameter C that defines the cost associated to misclassified data ( ) and

influences the model complexity [33]. Other parameters may need to be optimized but that

depends on the kernel function selected. The kernel function selection is a critical decision [33],

since it is the function responsible for the features mapping typically to a higher dimensional

space. The reason for mapping the features is due to the difficulty in separating the two pattern

classes in the original space.

The kernel function can be described as K(x,x’)= ϕ(x).ϕ(x

’), i.e., the dot product between ϕ(x)

and ϕ(x’) [38]. There exist several functions ϕ that originate several different kernels. The most

commonly used kernels are:

Linear: K(xi,xj)= xiTxj

Polynomial: K(xi,xj)=(ϒxiTxj+r)

d, ϒ>0

Radial basis function (RBF): K(xi,xj)= exp(-ϒ||xi - xj||2), ϒ>0

Sigmoid: K(xi,xj)= tanh(ϒxiTxj+r)

Where ϒ, r and d correspond to kernel parameters that can be defined or estimated.

20

Among the kernels listed above, RBF is the one typically recommended to start with. First of all

RBF can map the original space into a higher dimensional space in a nonlinear way, as it can

be seen in the kernel expression above. This is an advantage if the relation between the class

label and feature vectors is nonlinear. For this case, the linear kernel would not be a proper

choice due to the nonlinear relation. The linear kernel can only produce a linear feature space

through the feature vectors. Also, depending on the parameters selected, the performance of

both the RBF kernel and the linear kernel could be the same [38]. The sigmoid kernel has, in

one way, the handicap of being invalid for some parameter values, and in the other way

provides similar results to those of RBF for other different sets of parameters. Polynomial

kernels just like the sigmoid kernels have more parameters to estimate than RBF, which

requires a higher computational effort.

The advantage of RBF thus relies on presenting fewer numerical difficulties when compared to

the other non-linear kernels. However in case of a large number of features and a small number

of training samples the use of the linear kernel may be better than the RBF.

The SVM optimization problem is described as follows:

2

,

T

, 12

Subject to:

0

1

w

min

l

iw b i

ii

i

i

w

y

C

x b

(3)

Where w is the normal vector, C is the regularization parameter, is the error of the

misclassified data, the label attributed to each pattern (e.g. -1 for non-crack and 1 for crack),

the function ɸ depends on the kernel function used, the feature point and b the offset of the

hyperplane.

As observed from Figure 9, the distance of the two hyperplanes passing the support vectors of

the two different patterns is greater, the smaller the norm of the normal vector w. Since SVM

computes the hyperplane that is as far away as possible from the support vector it means that

the minimization of the norm of the normal vector is the key to solve the optimization problem.

The second term of the optimization problem is for the cases that the hyperplane cannot fully

separate the two pattern classes (e.g. when they are too mixed) and also to reduce the

influence of potential outliers. Therefore, to separate correctly as much as possible the two

classes, the term is introduced to achieve the minimum error. The parameter C is a constant

and establishes the influence of the .This approach is also known as the soft margin.

21

The condition Tw 1i i iy x b forces the training pattern to be higher than 1 i

when iy is 1 and below )(1 i when iy is -1, therefore establishing a decision boundary that

separates the two classes.

The SVM classifier performs a binary classification of the data. However SVM can be

transformed into a multiclass classifier. Typically two kinds of multiclass methods can be used

for this purpose. One of them is based on one against one (OAO) technique. It consists of

creating N(N-1)/2 classifiers where N is the number of pattern classes. Then each classifier is

composed of a different pair of classes. After that each classifier will vote for one pattern class

of the testing sample and the class having most votes wins, being labeled with the according

class [26]. OAO multiclass method is used in [1] to select one of ten different surface classes

and to classify the several types of cracks in [26].

The other multiclass method is one against all (OAA). This technique is not very frequently used

in crack detection problems. Basically there are as many classifiers as classes, being the

classifier that produced the highest output the one that labels the pattern class of the testing

sample.

As stated before, a few papers using SVM to detect pavement distress can be found in the

literature, showing better results than alternative methods [33] [26] against which they were

compared. For that reason it is expected that using an SVM classifier in this dissertation it will

be possible to achieve good results.

The structure of the SVM classifier developed is presented in Chapter 3 and the corresponding

classification results are reported in Chapter 4.

2.5 Crack classification

After performing crack detection, the final stage of the proposed architecture is crack

classification, aiming to assign a crack type to each previously detected crack. Although some

papers address this stage, they do not perform it after detecting the cracks. They simply label

the crack type for each image without having detected the cracks automatically first. Therefore

those papers were described in the previous section. This section addresses the papers that

use these two blocks simultaneously in their system.

Several types of cracks can be distinguished in road pavement images. According to JAE [3],

cracks can be divided into transversal, longitudinal, miscellaneous and alligator cracks. Other

22

papers consider a fifth different crack type designated as block crack. Some examples of these

types can be observed in Figure 2 and Figure 8, respectively.

Automatic monitoring systems should be able not only to detect cracks but also to classify them

according to their type. This can be done by extracting a set of features of the blocks labeled as

crack by the classifier.

In [6] three different types of cracks are considered, namely, longitudinal, transversal and

miscellaneous. A 2D feature set, based on the standard deviation of crack pixel coordinates is

considered, i.e., for each detected crack the standard deviation of the crack block row

coordinates and the crack block column coordinates are computed. These features can be

visualized using two orthogonal axes. The vertical axis represents the standard deviation of the

crack row coordinates and the horizontal axis the standard deviation of the crack column

coordinates, being the crack represented by a point in this space. In this 2D space a bisectrix

line is also considered. To classify each crack type the following classification rules are used:

If the distance to the axis is less than the distance to the bisectrix line and the nearest

axis is the horizontal one than the crack type is longitudinal;

If the distance to the axis is less than the distance to the bisectrix line and the nearest

axis is the vertical one than the crack type is transversal;

If the distance to any axis is more than the distance to the bisectrix line than the crack

type is miscellaneous.

Figure 10 is extracted from [6] and shows the 2D space for crack classification:

Figure 10 – 2D standard deviation for crack classification with an example of a longitudinal

crack represented by the point L1 taken from [6].

23

Points with the same row standard deviation value and column standard deviation value belong

to the bisectrix, representing perfect miscellaneous cracks. Perfect transversal cracks and

perfect longitudinal cracks are represented by points over the horizontal and vertical axis,

respectively. In the case of the point L1 (Figure 10), it is computed first the minimum distance

between L1 and the vertical axis and L1 and the horizontal axis (designated by dA). Then the

minimum distance between the point L1 and the bisectrix (designated by dL) is computed.

Finally the minimum value these two distances (dA and dL) dictates the crack type. In this

particular example since the point is closer to the vertical axis than the bisectrix the crack is

labeled as a longitudinal crack.

Oliveira and Correia in [6] achieved a crack type classification of 100%. Rosa and Correia in [5]

also use the same approach presented above after labeling the image blocks, since the crack

types were the same (transversal, longitudinal and miscellaneous). The best classification rate

obtained was 100% for the two databases used.

25

3 Proposed System

This Chapter presents the system developed in this dissertation for the detection of cracks in

road images and discusses the techniques involved in each processing step.

3.1 System architecture

The block diagram of the proposed system is shown in Figure 11. The system comprises four

processing blocks. The input image is pre-processed to enhance the contrast between the

crack and the pavement, speeding up the process of image computation and facilitating crack

detection. Then the pre-processed image, , is split into non-overlapping blocks (75x75 pixels),

. Each block can then be characterized by a vector of features, , which describes

the statistical properties of the block. A binary classifier is then used to assign a label,

to each block , depending on whether it contains crack pixels, , or not, .

Figure 11 – Block diagram of the proposed crack detection system.

After classifying each block according to the presence of cracks, a post-processing operation is

performed to correct unusual label configurations, e.g., isolated blocks classified as cracks

surrounded by non-crack blocks. A final module considering each connected crack region

detected, classifying it according to the considered crack types, notably longitudinal, transversal

or miscellaneous was not included in this architecture due to time restriction.

The following sections describe the databases used in this dissertation and each of the four

processing stages.

3.2 Databases

The current section aims to present relevant information about the databases used in this

dissertation.

Crack

detection

Post-

processing

Pre-

processing

Feature

extraction

26

Two databases with two different road pavement types have been considered in this

dissertation. Both databases are composed by asphalt but present relatively different road

pavement texture as illustrated in Figure 12. For instance, the grey level presented in both

database is different leading to two distinct pavement types.

Road pavement types can differ in many ways from each other based on their texture

composition, like presented in Figure 1. Not only can the grey level vary from one pavement to

another but also the granulation size as well as the granulation distribution.

Figure 12 – Example of two images of the two different databases considered

Both databases are composed by grey-scale images, one being acquired with a digital camera

during a human observation survey (the left image of Figure 12) while the other was acquired

through a LRIS system [9] (the right image of Figure 12). The two databases contain not only

images with several types of cracks, namely, longitudinal, transversal and miscellaneous (see

the first three images of Figure 2) but also images with no cracks. In general all images of the

two databases were taken, with the optical axis of the camera orthogonal to the pavement

where each pixel corresponds approximately to 1 mm2. Both image databases are affected by

non-uniform background illumination and texture noise produced by the road surface.

The first database, Road1, has 49 images with cracks and 7 with no cracks, while the second

database, Road2, has 87 images with cracks and 78 with no cracks. For the two databases

considered, each block has been manually labeled as containing crack pixels, or not, by a

specialist, providing a ground-truth classification that can be used for classifier training as well

as for classifier evaluation purposes. The resolution in each database is also different. Road1

has a resolution of 2048x1536 and Road 2 a resolution of 2048x4096. Since both resolutions

are high and one of the problems related to image processing is computation time and memory

storage, each image is divided into blocks of size 75x75 pixels, allowing a faster computation

[29] and lower memory storage requirements. Bigger blocks negatively influence the accuracy

rate of the crack detectors, while smaller blocks would increase the computation time. The

selected size is a good compromise between these two constraints.

27

3.3 Pre-processing

This section focuses on the pre-processing techniques used in the proposed crack detection

system, to more easily differentiate crack blocks from non-crack blocks, thus contributing to

improve the overall system accuracy.

Since the database images do not contain shadows [16] or large regions with high pixel

intensity like light halo [13], the pre-processing stage can be simplified. Nevertheless, the

images exhibit different visual properties caused by non-uniform illumination and non-uniform

road surface texture. Removing the influence of such factors, while preserving information about

the presence of cracks, will facilitate crack detection and improve the robustness of the system.

In the previous Chapter some techniques that could lighten these kinds of artifacts, were

presented. Based on the discussion made, and in face of the observed characteristics of the

image databases to be used in this dissertation, the pre-processing techniques selected that

seemed appropriate are: i) top-hat filter, ii) mean-filter, iii) min-filter and iv) adaptive histogram

equalization.

Figure 13 shows the three distinguished pre-processing configurations considered in the

developed system: i) top-hat; ii) mean-filter followed by top-hat and iii) min-filter followed by

adaptive histogram equalization. Top-hat is a filter that eliminates relevant high-intensity noise

without affecting the crack blocks, being also capable of getting a more uniform image. For that

reason it was one of the configurations chosen. Mean-filter followed by top-hat is the second

configuration selected for the developed system. The mean-filter can be used to eliminate most

of the non-uniform texture and most of the non-uniform illumination. The mean filter is applied

first to guaranty a more uniform background, without jeopardizing the contrast between the

background and the crack pixels. The last pre-processing configuration seems promising, since

it reinforces first the crack pixels through a min-filter technique and then an adaptive histogram

equalization technique is applied, to enhance the contrast between the cracks and the

pavement.

28

Image Database

Top-hatMean filter

and Top-hat

Min filter and

AHE

J1 J2 J3

Figure 13 – Presentation of the several pre-processing configuration applied to the image

database, namely top-hat, mean filter followed by top-hat and min-filter followed by adaptive

histogram equalization (AHE).

The three configurations use techniques that depend on parameters that must be pre-defined.

The top-hat filter sets a maximum grey value that can be defined as it can be seen in Figure 14

(extracted from [6]). The selection of this parameter is based on a value that would not damage

the system overall accuracy (non-crack blocks classified as crack blocks) but could get a more

uniform image. Based on the mean histogram of the non-crack blocks and crack blocks of

Road1 (Figure 15), the grey value that could satisfy these two conditions is empirically set as

150. Since the non-crack histogram slope starts to decreases more rapidly than the crack

histogram slope beyond 150, the percentage of crack blocks that have a high average value is

higher than the percentage of non-crack blocks. Therefore the value 150 has a bigger impact on

the crack blocks average value than on the non-crack blocks average. A lower value would not

bring any benefit since the slope of the two histograms is similar and the two block types would

start to be very hard to distinguish from each other. Higher grey values the image would be less

uniform and the number of the crack blocks with a high average would be considerable.

The mean filter parameter relies on a pre-defined mask size. Since crack pixels are very rare,

they are influenced by the grey level of the non-crack pixels (typically of much higher intensity)

around them. Therefore the filter size selection must be small enough for not hampering the

crack pixels. Based on this, the adopted size was a mask of 3x3 pixels.

The min-filter parameter was also a 3x3 mask. Since the min-filter chooses the minimum value

within the mask for each pixel a bigger matrix would darken too much not only the crack area

but also non-crack areas due to possible isolated pixels with low grey levels. The adopted size

is a good compromise between reinforcing the crack pixels and preventing other non-crack

areas to be misclassified later by the classifier, compromising the system overall accuracy.

29

Figure 14 – Top- hat filter representation. All the image pixel intensities above api will have a

grey level equal to api.

Figure 15 – Mean histogram of the cracks (left) and non-cracks (right) of Road1.

An example of each configuration with the parameters selected for each technique (the same as

stated before in this section) is shown in Figure 16.

30

Figure 16 – Example of the three pre-processing combinations. Top-left is the original image,

top-right the top-hat image, bottom-left mean filter followed by top-hat and bottom-right min-filter

followed by adaptive histogram equalization.

3.4 Feature extraction

Based on the discussion made in the previous Chapter and the discussion on the features that

lead to promising results, in the block-based sub-section, the set of features selected to

characterize each block are the following ones:

Minimum value

Mean intensity

Variance

Higher order moments (3rd

, 4th and 5

th order)

Each of these features will be discussed in the sequel.

31

3.4.1 Features selected for road crack detection

Since crack pixels correspond to low intensity values, one feature that is capable of

distinguishing crack blocks from non-crack blocks is the minimum intensity pixel (mip) [5] value

within an image block:

( )

xmin (x)

iimip B (4)

where denotes the image intensity at position in the i-th block.

Mean is the second feature used for crack detection. This feature describes the crack average

intensity within a block [6] and differentiates crack from non-crack blocks since the more crack

evidence the block has the lower is the block mean. The mean value is computed as follows:

( )

2x

1( )

ii B xN

(5)

where N2 is the block size.

The third feature considered is the second order statistic (variance), which has also already

been used to detect cracks before [6]. Actually in [6], the feature used is standard deviation, but

since the standard deviation is the square root of the variance the information extracted from

each block is essentially the same.

Assuming the mean value of each block is similar to each other, and since there are much more

non-crack samples than crack samples but the crack grey level is much lower, the crack blocks

variance should be higher when compared to non-crack blocks variance. The expression that

describes this feature for block is described as follows:

( )

2 ( ) 2

2x

1( )

i i iB xN

(6)

This approach can be extended to higher order statistics. Notably, the central moment of order

is defined as follows:

( ) ( )

2x

1( )

ii i k

k B xN

(7)

Just like the case of the variance, the bigger the value produced for each block the higher the

probability of being a crack block, since in this database there is not the problem of a low grey

32

level being something different than a crack. It can be just a random dark point but there is not

the risk of being confused by a shadow or oil stain. The odd moment order statistics have the

advantage of producing a negative value as output in the case of the pixel intensity is lower than

the block mean, which is not possible for the even moment order. In fact the more negative the

feature is, the more likely of being a crack block. The advantage of the odd moments over the

even moments is the fact that crack pixels are, in their majority, below the block mean value

while the non-crack pixels, in their majority, much closer to the mean, separating better the two

pattern classes.

Mip, mean and variance (standard deviation) are features already seen in the literature that

provided good results (see section 2.3.2). The higher order moments defined also in this section

do not often appear in the literature. They come in the sequence of the frequently use of the first

and second order, being merely experimental features that proved to be helpful in the

developed classifier as shown in the following Chapter. Actually the third order moment is used

in [4] for pothole detection.

3.4.2 Statistical properties of the selected features

This section briefly characterizes the statistical properties of the proposed features. All the

features extracted from the database can be represented in a vector, usually called feature

vector. The feature vector and block label, respectively and , are random variables

characterized by a joint probability distribution.

( ) ( ), ( | ) ( )i i i i ip x y p x y p y (8)

Where is the probability density function of the feature vector conditioned on the

block label assigned and is the probability of each class (crack or non-crack). Since

is difficult to estimate and to visualize, as it depends on the considered set of

features (six features were listed in the previous subsection), the marginal distribution of each

feature

will be considered, by computing the histogram associated to both classes.

Figure 17 shows the normalized histograms of the six considered features for each of the two

admissible labels and for the first database (Road1), without applying any pre-processing

technique. This way the response of each feature to the original data is observed, giving a hint

of the most promising features capable of separating the two classes efficiently. However the

reader should bear in mind that after applying pre-processing the order of the most promising

features may be altered.

33

(a) (b)

(c) (d)

(e) (f)

Figure 17 – Histograms of the six features of crack (red) and non-crack blocks (green): a)

minimum value; b) mean intensity; c) variance; central moments of order d) 3, e) 4 and f) 5. The

vertical axis corresponds to the probability of each pattern to occur. The horizontal axis

corresponds to the feature value. Note that these histograms are normalized.

The information supplied by these graphics shows visually how good each feature is in

separating the two classes. However, it is hard to quantify it, just by observing the histograms.

To determine the ability of each feature to predict the corresponding label , the mutual

information (MI) and the correlation coefficient defined in [39], were computed. The

corresponding results are shown in Table 1.

1

1 0

( , ), y ( , ) log

( ) ( )j

Mj

j j

x y j

p x yMI x p x y

p x p y

(9)

M

1

( , ), y

* ( )j

j

j

xj

cov x yR x

var x var y

(10)

-10 -8 -6 -4 -2 0 2 4 6 8 100

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

-10 -8 -6 -4 -2 0 2 4 6 8 100

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

-10 -8 -6 -4 -2 0 2 4 6 8 100

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

-10 -8 -6 -4 -2 0 2 4 6 8 100

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

-10 -8 -6 -4 -2 0 2 4 6 8 100

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

-10 -8 -6 -4 -2 0 2 4 6 8 100

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

34

The first equation concerns the mutual information and the second the correlation coefficient.

Since these two measures work with discrete values, each histogram is divided into 100 bins.

This has the consequence that every point with a distance of 0.2 in the horizontal axis is

collected. The crack label of the minimum value histogram (red line) is the most affected by this

discretization measure, in terms of the number of points, since it only has values higher than

zero between -3 and 0, i.e. the feature values, leading to 15 points collected. The most affected

non-crack label (green line) corresponds to the fourth and fifth order, having also 15 points

collected since it has, substantially, a gap of 3 where the non-crack blocks are higher than zero.

In these 3 graphs the slope of the histogram for the respective label is much higher than the

other slope histogram label. Still the number of bins defined was high enough to characterize

well the slope as observed in Figure 17.

The values of the correlation coefficients can vary between -1 and 1. The bigger the absolute

value the more correlated the two variables are ( and ) the better the feature. The mutual

information value can vary between - (if =0) and + (if =0 or =0). The bigger

the mutual information value the more information can be extracted from the feature.

Table 1 – Mutual information and correlation coefficient of the proposed features for the first

database.

Features MI R

0.120 -0.49

0.006 -0.07

0.089 0.52

0.134 0.63

0.112 0.69

0.146 -0.70

Despite mutual information and correlation coefficients being two quality measures to rank

features, their output can differ as observed in Table 1. However both measures classified

feature number six (fifth moment) as the best feature for the original data. Mean and variance

were the worst ranked features in both measures (in absolute terms the variance value in the

correlation coefficient is higher but similar to the first feature mip value while in the mutual

information the mip value is much higher. Therefore the variance was considered the second

worst feature). For the four best ranked features (third, fourth and fifth order moment and mip),

six scatter diagrams, each one representing one different feature pair, are made and shown in

Figure 18. The features in these scatter diagrams are normalized.

35

(a) (b)

(c) (d)

(e) (f)

Figure 18 – Scatter diagrams of six feature pairs for the first database a) mip and third order, b)

mip and fourth order, c) mip and fifth order, d) third and fourth order, e) third and fifth order, f)

fourth and fifth order. The green corresponds to non-crack blocks while the red represents the

crack blocks. The horizontal axis corresponds to one of the features used (the first in each

paragraph) and the vertical axis corresponds the other (the second).

As observed the features that have the higher values in the mutual information and correlation

coefficient do not correspond automatically to the best pair, in fact, the scatter diagrams

containing two order moments proved to be not as good as expected. The reason resides in the

information contained in those features being correlated, i.e., two random features can have

more information together than the combined information of the two most promising features. It

is questionable to determine the best feature pair that can separate effectively the two pattern

classes just by looking at these graphics. However, some feature pairs (e.g. mip and third order)

36

seem to divide the two classes properly, leaving a small region where the two classes are mixed

than other feature pairs (e.g. mip and fourth order) where the mixed region is more intense.

It is possible to use more than two features at same time. In fact, all features can be used at the

same time, leading to a higher dimensional feature space, whose histogram would be hard to

visualize. The purpose of these histograms was to gain some intuition about how good are the

features for the data.

Note that Figure 18 illustrates all the blocks of all Road1 images. When dealing with a

supervised classifier a training set and a testing set is required. So part of these data will

compose the training set and the remaining data the testing set.

3.5 Classifier

The current section starts with the training stage procedure along with important details,

capable of improving the system overall performance. Moreover a brief explanation about the

testing phase functioning is also present. Finally a description of the strategy applied to select

the training set and the testing set is presented, followed by the SVM parameters selected to

train the system.

Since SVM is a supervised learning algorithm, the first stage of the classifier is a training one.

After choosing and extracting the features from the training set and supplying the respective

ground-truth, it is necessary to select also the kernel type and the kernel parameters (see

section 2.4.2.4 for more detail). Then the classifier computes the decision boundary that

separates best the two classes, being used later to classify the testing set samples.

An important factor to establish in the training stage is the balance of the crack and non-crack

samples. If the two different classes have a big difference in terms of number of samples used

for training purposes, the classifier may not achieve optimum performance, rather giving higher

priority to correctly classifying the class that has more samples (non-crack). Several strategies

can be considered to handle this problem, also known as imbalanced data [40]. For instance the

crack samples can be repeated until achieving the same number as the non-crack samples.

Other methods could be the selection of a random portion of the non-crack features (under-

sampling) or the creation of new data for the smaller class (over-sampling). One issue

concerning this last technique is the fact of being data depended, which may not apply for every

case. Another measure can be the adjustment of the regularization parameter C. Instead of

replicating the crack features to match the number of the non-crack features, leading to a higher

computation, the C value can be adjusted for each of the two different classes, in a way that the

37

sum of all crack samples weights would be the same as all the non-crack samples weights. In

this dissertation the strategy selected was to repeat the existing crack samples, until achieving

the same number as the non-crack samples.

Another measure that helps improving the system performance is the normalization of all the

features

by subtracting the mean and dividing by the standard deviation i.e.

( )

( )

i

j ji

j

j

xx

(11)

Where stands for the block and is the feature number. The mean and standard deviation are

computed in the feature vector of the training set.

As soon as the system is trained and the SVM model is computed, the testing phase takes

place. The purpose of this stage is to test the system accuracy on detecting cracks as well as

the quality of the crack features that were extracted. To achieve reliable and accurate results,

the testing and training sets should be disjoint.

Just like the training set, the images of the testing set will be pre-processed in the same way

and divided in blocks of 75x75 pixels. After that, the features from each block of the testing set

are extracted and normalized. Then the model created in the training stage is applied to all the

features extracted from the testing set. In this step the testing set features are mapped to the

same dimension of the training features and the hyperplane computed previously to separate

the two classes is used. Depending on the side of the hyperplane the testing features fall on,

the corresponding pattern class is attributed.

The training and testing sets were selected through a cross-validation technique, which supplies

a good way to compute an approximation of the classifier accuracy on average. Cross-

validation is a technique that divides equally the database images into several folds. Typically

the training set is composed of all folds but one and the testing set is composed by the

remaining fold. Then, several different classifiers are trained and tested (as many as the

number of folds), where each classifier is composed of a different testing set from each other. In

the end the system accuracy is computed based on the average value of the several classifiers

accuracy.

The training and testing stages of the SVM classifier were performed using two functions from a

library for SVM called LIBSVM. Several parameters were selected from this library. For instance

the kernel type selected to separate the two pattern classes was RBF since it is the kernel that

38

typically supplies better results [38]. The kernel parameter ϒ was the inverse of the number of

features (default value) while the weigh chosen for the regularization parameter C was one

(default value) for each class since the number of crack samples and non-crack samples were

the same.

Once all of the testing samples are assigned a comparison between the output produced by the

classifier and the ground-truth will dictate the system accuracy. The comparison consists on

evaluating each block of the testing set labeled by the classifier with the respective ground-truth.

Then, based on the correct or incorrect decision of the classifier for each block, the system

performance is computed as described in Section 4.3. Actually, the comparison between the

system output and the ground-truth is only made after applying the post-processing. The next

section describes the post-processing used and the contributions it supplied to improve the

system performance.

Note that the use of the ground-truth is always necessary for training the system to know the

features that correspond to crack or non-crack blocks. Without this information it would not be

possible to compute the decision boundary and separate the two classes. However, the testing

set does not require any ground-truth since it will apply the model created in the previous stage

and label each testing block based on it. The ground-truth is only used in this stage to be

compared with the system output to measure the system performance.

Note also that the system is only trained once for each feature set to create the respective

model but it can be tested for several different testing sets. So even if the training stage is

computational heavy due to its large number of samples, this effort is only made once.

3.6 Post-processing

After classifying all the testing set blocks with one label (0 or 1), a very simple and intuitive post

processing such as cleaning the isolated blocks classified as containing cracks is applied. Due

to the block size chosen (75x75 pixels) and the elongated characteristic that cracks normally

have, it would be strange that an isolated block contain any crack feature. It is more prone to be

noise and therefore a misclassified block. This procedure proved to be very effective, erasing

misclassified blocks, therefore improving the system performance like shown in Figure 19.

However one image of the database is accidentally damaged because of this operation, being

shown in Figure 20. The top image corresponds to the ground-truth, being the middle image the

classification output and the bottom image the output after pos-processing. As observed, post-

processing cleans most of the blocks that were incorrectly classified as containing crack pixels,

39

but it also erases two blocks classified as cracks in the ground-truth information due to the fact

they were isolated.

Figure 19 – Example that shows the improvement of the post-processing technique. The top

image corresponds to the ground-truth, the middle image to the classifier output and the bottom

image to the classifier output after post-processing.

40

Figure 20 – Example that eliminates crack blocks using a post-processing technique, leading to

worse results. The top image corresponds to the ground-truth, the middle image to the classifier

output and the bottom image to the classifier output after post-processing.

Some images, at first sight, present less positive results, like illustrated in Figure 21, since

several blocks were wrongly detected as cracks. However it can be seen that, for a few blocks,

the classifier actually discovered some crack evidences despite not being qualified as such in

the ground-truth labeling. This issue concerns the boundary of the crack definition for the

specialist that labels each block which can differ from other specialists. Despite the ground-truth

sometimes is subjective, the blocks classified by the system were always compared with the

ground-truth data. Still, the post-processing helped to clean some of the block classified as non-

crack by the ground-truth, improving the classifier performance.

41

Figure 21 – Example of a doubtful case where the classifier detect blocks that have crack

evidence but not classified as such by the ground-truth. The top image corresponds to the

ground-truth, the middle image to the classifier output and the bottom image to the classifier

output after post-processing.

Figure 22 shows two examples of non-crack images. This kind of images can have a strong

impact in the overall results, since there are only two possible output classifications: 100% right

or 100% wrong. It only takes one block falsely classified as crack to significantly affect the

overall classification results, unlike what happens for images containing cracks, which can have

several blocks wrongly classified as cracks but still producing satisfactory results. After post-

processing these images, the system managed to get an 100% correct classification, improving

significantly the system overall performance.

42

Figure 22 – Non-crack images examples that were rightly classified after pos-processing. The

right column images are the ground-truth (in this case it coincides with the original image) and

the left column images the classifier output. After post-processing the block classified as crack

in each image is eliminated.

43

4 System evaluation

The current Chapter is structured as follows. The test conditions of the several simulations

made in each of the two databases are described in the first two sections. The third section

presents the evaluation metrics applied in the developed system and the last section the results

achieved for each of the two databases.

4.1 Test conditions - Road 1

Several experiments were applied for the first database (Road1), involving several different

feature sets.

The first simulations are made without any pre-processing technique, being then repeated with

several pre-processing configurations (see section 3.3) to compare the different results (without

pre-processing and with pre-processing) and evaluate the most promising feature sets and pre-

processing techniques.

A cross validation approach was applied in each test, leaving always one image out to be part

of the testing set. This lead to 56 different classifiers trained in each experiment.

4.2 Test conditions - Road2

All the simulations made in the first database were repeated for the second one with the same

pre-processing configurations and parameters to further establish a comparison between the

two databases results.

The leave one image out strategy was not applied for the second database. Since this database

has more images (165 images) than the first database, it might be exaggerating putting all the

images but one in the training set. In addition, the second database has a higher resolution than

the first one, leading to a higher number of features to analyze per image and taking

considerably more time to train a model. A simulation test to find a smaller training set that still

supplies similar results as the leave one image out technique, for the second database, is

presented in Figure 23. This simulation test includes the average value of 10 different

simulations for each training size (1 to 164 images) randomly selected.

The horizontal axis represents the number of images of the training set while the vertical axis

the statistical measure to evaluate the classifier output (in this case the evaluation metric recall

44

was selected for the first graph and f-measure for the second graph). This simulation test used

mip and third order moments as features without using pre-processing.

Figure 23 – Performance for different training set sizes (training images are randomly selected).

The horizontal axis represents the number of images of the training set. The vertical axis

represents the evaluation metric (recall for the first graph and f-measure for the second graph).

Figure 23 shows a slightly improvement of the system performance with the increasing of the

training data for both graphs. Despite the results obtained using the f-measure as an evaluation

metric are very low, the results achieved using recall are much better. This reveals that most of

the blocks containing cracks were classified as such but also many non-crack blocks. The fact

that this database has more difficult cracks to detect than the first database and has nearly 50%

of non-crack images helps to clarify the poor results achieved by the f-measure. Since both

graphics demonstrate the highest values between the training set size 140 and 160 a ten-fold

cross validation is applied for the Road2 simulations. This way the model created in the training

stage still has enough information (training samples) to describe well the cracks with less

computational effort. However, the models computed for Road2, following the ten-fold cross

45

validation, will differentiate more from each other, providing a more robust test to the feature

sets and determining also with more precision how good these features to describe cracks are.

The same does not happen with the first database since only one image is exchange in each

iteration of the cross validation (leave one image out), being the models much more similar

between each other than the models computed in the second database.

4.3 Performance measures

In this subsection the statistical measure used to classify the system output are first presented.

Then the following subsection illustrates the results achieved in each simulation of the two

databases (Road1 and Road2) in Table 2 and Table 3.

Several evaluation metrics were computed to quantitatively classify the results, namely

confusion matrix, precision, recall and f-measure (combination of the precision and recall

measures). Confusion matrix is composed of four metrics, true positive, false positive, true

negative and false negative. True positive corresponds to the number of crack blocks that were

well classified while false positive to the number of crack blocks that were misclassified, i.e.

blocks that were classified as crack while the ground-truth labeled as non-crack. False negative

corresponds to the number of non-crack blocks that were misclassified and true negative to the

number of non-crack blocks that were well classified.

Precision is a measure that dictates the number of cracks that were well classified (true positive)

over the total number of cracks detected by the classifier (true positive + false positive). Recall

measures the number of cracks that were well classified (true positive) over the total number of

regions classified by the ground-truth as cracks (true positive + false negative). So precision can

be seen as a quality measure since it measures the quality results of the classifier while recall is

more similar to a quantity measure since it computes how many regions were classified

correctly, comparing with the ground-truth. F-measure is a combination of these two metrics and

it is described by the following expression:

2* *

_precision recall

precision recallf measure

(12)

46

4.4 Experimental results

This section presents the crack detection results, without pre-processing as well as with several

pre-processing configurations, measured by the recall, precision and f-measure, for the two

databases considered. These results are shown in Table 2 and Table 3.

Table 2 – Precision (top), recall (middle) and f-measure (bottom) of Road1

features without pre-processing top-hat

mean& top-hat min & ahe

Mip3 83.96% 81.17% 53.86% 27.16%

Mip4 66.46% 83.43% 50.83% 53.36%

Mip5 83.92% 84.18% 44.28% 26.98%

MV 62.15% 77.06% 44.93% 59.77%

MV3 88.11% 88.11% 88.9% 71.1%

MipMV 62.57% 71.21% 63.96% 51.55%

MipMV3 87.16% 88.54% 88.99% 69.72%

MipMV34 88.52% 88.28% 89.4% 69.53%

MipMV345 88.27% 87.81% 89.1% 70.26%



Mip3 97.62% 98.06% 94.23% 96.95%

Mip4 97.44% 97.7% 90.5% 94.04%

Mip5 97.71% 98.11% 91.1% 96.72%

MV 77.57% 94.18% 91.49% 86.09%

MV3 98.27% 98.68% 98.04% 95.71%

MipMV 97.39% 96.77% 94.62% 93.9%

MipMV3 98.62% 98.67% 98.23% 97.16%

MipMV34 98.85% 98.49% 98.11% 97.21%

MipMV345 98.85% 98.49% 98.18% 97.02%



Mip3 89,8% 87,66% 67,93% 41,99%

Mip4 77,01% 88,91% 64,41% 62,95%

Mip5 89,91% 89,79% 59,24% 41,71%

MV 57,99% 81,81% 59,28% 63%

MV3 92,56% 92,73% 92,84% 79,6%

MipMV 76,63% 80,05% 74,91% 62,25%

MipMV3 92,22% 92,93% 92,88% 79,4%

MipMV34 93,07% 92,67% 93,09% 79,25%

MipMV345 92,94% 92,41% 92,94% 79,77%

47

Table 3 – Precision (top), recall (middle) and f-measure (bottom) of Road2.


mean & top-hat min & ahe

Mip3 11.95% 14.87% 16.75% 5.72%

Mip4 13.7% 15.93% 19.33% 6.76%

Mip5 14.13% 15.35% 17.15% 6.06%

MV 8.44% 16.23% 21.87% 7.35%

MV3 15.44% 19.8% 21.21% 15.57%

MipMV 13.64% 17.25% 17.54% 10.09%

MipMV3 15.87% 22.52% 20.19% 15.21%

MipMV34 19.11% 26.12% 20.12% 16.93%

MipMV345 24.04% 26.99% 20.19% 16.87%



Mip3 91.64% 93.36% 93.85% 78.8%

Mip4 91.5% 91.92% 90.83% 78.32%

Mip5 92.67% 94.19% 93.93% 79.35%

MV 66% 81.98% 82.09% 76.64%

MV3 84.04% 93.02% 92.43% 84.3%

MipMV 90.2% 90.55% 90.86% 81.42%

MipMV3 91.32% 94% 92.53% 85.43%

MipMV34 93.66% 94.29% 92.66% 88.16%

MipMV345 92.59% 94.29% 92.5% 88.36%



Mip3 20,30% 24,7% 27,5% 10,2%

Mip4 22,83% 25,62% 30,1% 11,78%

Mip5 23,42% 25,46% 28,04% 10,75%

MV 13,9% 25,13% 31,88% 12,67%

MV3 24,52% 31,18% 33,23% 24,22%

MipMV 22,78% 34,84% 28,34% 16,7%

MipMV3 25,85% 39,24% 31,96% 23,8%

MipMV34 30,37% 40,37% 31,87% 26,44%

MipMV345 36,56% 27,82% 31,98% 26,43%

Since it is not possible to know in advance, which feature combinations are the best, several

feature sets were defined and tested. It is possible to merge all features into a single feature

vector and use it for classification. However, this strategy may not lead to the best results.

Increasing the number of features usually increases the amount of information extracted from

the image, but also increases the feature space dimension. Training a classifier in a high

dimension feature space is more difficult since it requires a very large data set, making also

48

supervised training much harder. This effort can be unnecessary since it can lead to worst

results. This problem is known as the curse of dimensionality [39].

As a consequence 9 different feature sets were tested. Table 2 and Table 3 present the results

achieved by the several feature set configurations admitted for the two databases (Road1 and

Road2). “Mip” corresponds to the mip feature, “M” and “V” to the mean and variance feature,

respectively, and each number to the respective order moment (3- third order moment, 4- fourth

order moment and 5-fifth order moment). Section 3.4.1 presents and describes these six

features.

Each column of each table represents a different pre-processing configuration. In total 4

configuration were used, namely no pre-processing stage, top-hat, mean filter followed by top-

hat and finally minimum filter followed by adaptive histogram equalization. Section 3.3 presents

and discusses these configurations such as their parameters.

In Table 2 several different feature sets achieve a f-measure higher than 90% without using any

pre-processing. Some other feature sets are not so good but still presenting a high f-measure

value while others presented a low value (e.g. the MV feature, as expected by the analysis

made in section 3.4.2). Comparing the results achieved without using any pre-processing with

the pre-processing ones, the top-hat configuration results were similar with exception of Mip4

and MV that improved considerably, while the configuration mean & top-hat present much lower

results for some feature sets and slightly better for others. The last pre-processing configuration

proved to be a bad choice, showing much worse results.

A significant difference between the evaluation metrics precision and recall (top and middle

table of Table 2 respectively) can be observed. The precision average value for each feature set

is always lower than 90% while most of the recall average value of each feature set is higher

than 90%. Since the evaluation metric f-measure is highly influenced by the lowest value, the

precision metric is the main responsible for not obtaining better results. In addition it is stated in

[11] that recall is more important than precision for crack detection, i.e. to detect the crack

blocks correctly has more meaning than label non-crack blocks as cracks. Moreover the number

of crack blocks are much less than the number of non-crack blocks, being very difficult to obtain

a high precision.

Comparing the recall and f-measure results for all the configuration sets it can be observed that

the recall values are always much higher than the f-measure, being the best recall 98.85% and

the best f-measure 93.09%.

49

In general, the pre-processing configuration tested does not improve significantly the system

performance. Considering the mip feature, it is reasonable to assume that the first two

configurations (top-hat filter and mean & top-hat) have very little impact on the minimum pixel

value of the block, since the mean filter is very small (3x3). But both configurations will affect the

several order moments features. The idea was to gather more the features from each pattern

class, increasing the distance between classes. For the top-hat configuration, the changes were

few since it only removes the noise produced by white pixels, and most of the feature sets

remain with a similar f-measure or a little improvement in comparison with the non pre-

processing configuration results. However, the top-hat configuration helped significantly the

feature pair MV, since it altered the mean and variance of each crack block to values closer to

what expected from a crack block (low mean and high variance). The non-crack blocks have

also their mean diminished, but their variance remained lower than the crack blocks. This

implies a better separation of the pattern classes.

The mean & top-hat configuration should improve the results obtained with the top-hat

configuration, since it should get a more uniform image with a higher contrast between the

cracks and the background, therefore creating a higher union for the same pattern class and a

higher separation for the different pattern classes. However the opposite effect is observed.

Comparing both results, the mean & top-hat recall values were always lower, while only four

feature sets could achieve a higher precision value. This means that a shorter distance between

classes occurred. Observing the similarity of the mean histograms of the crack blocks and non-

crack blocks (Figure 15) it is not so strange that the effect of the mean filter causes an

approximation between the two pattern classes, rather than increasing their distance, leading to

worse results.

The last configuration presents much worse results than the others. Despite reinforcing the

crack features and therefore enhancing the contrast between the crack and the background, it

also does the same thing for other artifacts that can be misclassified as crack by the classifier

later (note that the recall values are a little lower than the recall values of the other

configurations, but the same does not happen with the precision values).

To compare the results in both databases and to be faithful to the methodology followed for the

first database, the same simulations with the same parameters were repeated for the second

database. The f-measures obtained in Table 3, using the pre-processing configuration top-hat

and mean & top-hat were better than the results achieved without using any pre-processing,

while the min & ahe configuration proved to be worse.

In quantity terms the results achieve in this database are very low. The facts that the cracks are

much tougher than the first database and a ten-fold cross validation technique instead of a

50

leaving one-image out technique was used, which can worsen the model created in the training

stage can contribute to these results. In addition, the fact that a lot of non-crack images exist in

Road2 (about 50%) can help dropping significantly the system accuracy since any block labeled

as crack in these images only jeopardizes the system performance.

The testing set evaluation was done considering all the images and not each image individually,

i.e. all the four metrics of the confusion matrix were computed for the entire testing set as if it

were only one image instead of an average value of each metric extracted from each image.

That way classifying a block as crack in a non-crack image does not damage much the total

true positive metric but it will still degrade the f-measure metric.

Once again the precision metric is the main responsible for these results. The fact that these

images have a bigger resolution and several blocks are considered crack while they are not is

the main issue. In addition the feature sets selected may work well for the first database (since

they were specifically adjusted for Road1) but may not be the best for the second database.

This fact suggests, as discussed previously, that texture noise and the pavement road type can

be a significant influence in the classifier final results.

Other relevant observation is the fact that different pre-processing strategies should be used for

different feature sets, i.e. it may not make sense to use the same pre-processing for all the

feature configurations. For instance, Oliveira and Correia used in [6] a normalization pre-

processing followed by a top-hat filter, applying mean and standard deviation as features. In this

dissertation these two features as well as the top-hat filter are used, showing a significant

improvement compared with the original data results (without pre-processing) for both

databases. Therefore, a more careful and dedicated pre-processing selection for each feature

set could allow obtaining better crack detection results. Table 4 presents the results of isolated

features without pre-processing while Table 5 presents the best results achieved using several

distinctive pre-processing for each feature. Then, the best recall and f-measure of several

different combinations of features and pre-processing, forming a joint combination, are

presented in Table 6, where each feature within each cell was pre-processed by the technique

on the same line. The results achieved in these three tables concerns the first database.

Table 4 – Results of isolated features with and without pre-processing for the first database.

features recall precision f-measure

mip 97.26% 67.49% 77.88%

mean 76.18% 7.84% 13.52%

variance 80.34% 46.67% 44.86%

order3 88.28% 89.17% 86.39%

order4 85.38% 59.58% 60.33%

order5 92.48% 95.86% 93.91%

51

Table 5 – Results of isolated features with pre-processing for the first database.

pre-processing features recall precision f-measure

AHE mean 69.28% 14.07% 22.88%

top-hat variance 90.04% 78.99% 80.90%

top-hat order3 97.88% 82.86% 88.68%

mean order3 92.32% 94% 92.57%

top-hat order4 97.79% 83.54% 89.02%

top-hat order5 98.11% 83.77% 89.54%

mean order5 95.30% 94.09% 94.53%

Table 6 – Best joint recall (top) and best joint f-measure (bottom) achieved for the first

database.

pre-processing features recall precison f-measure

none order5

99.04% 87.59% 92.64% top-hat order3

top-hat order5

mean order5

none order5

99.04% 88.85% 93.31%

top-hat variance

top-hat order3

top-hat order5

mean order5

none mip

98.98% 89.12% 93.48%

top-hat variance

mean order3

top-hat order4

top-hat order5

mean order5

pre-processing features recall precison f-measure

top-hat variance

97.86% 91.32% 94.18% AHE mean

mean order5

none mip

98.71% 90.20% 93.97%

AHE mean

top-hat variance

mean order3

top-hat order4

none order5

top-hat order5

none mip

98.71% 90.16% 93.94%

AHE mean

top-hat variance

mean order3

top-hat order4

top-hat order5

52

A significant improvement is observed when comparing Table 4 with Table 5, i.e. when a

specific pre-processing is applied to a specific feature, the system performance improves in

comparison to the original data results (no pre-processing). Furthermore, Table 6 presents in

the first table the best recall metrics, being not only superior to the recall measures presented in

Table 4 and Table 5, but also to the best recall results previously achieved for the first database

(98.85%). However, this has the implication of achieving a lower f-measure when comparing to

some of the feature sets of both tables. Even the best f-measure metrics, presented in the

second table of Table 6 ,that are higher than the best f-measure achieved in the first database

(93.09%), were lower than the f-measure achieved by the feature order5 pre-processed with a

mean filter in Table 5. However, this fact as already stated in this dissertation, has little impact

on crack detection since the recall measure is more important than precision, due to its

capability of detecting more accurately the cracks even if, sometimes, misclassifies more blocks

as crack.

Therefore, by adjusting the right pre-processing to the right feature, the system performance

can be improved obtaining better results than the previous ones, since the recall achieved is

higher than the original data results or some generic pre-processing applied blindly to all feature

sets.

53

5 Conclusions and Future Work

This dissertation proposes a system for the detection of cracks in road images based on the

SVM classifier. Since SVM is a learning technique, the database is divided into two sets, one for

training the system and another for testing the system. Two different databases (Road1 and

Road2) were used to evaluate the system.

It is not always possible to compare the results obtained with other papers since some of them

do not present quantitative results using the standard evaluation metrics. Often, only some

qualitative results or evaluation metrics are provided, making it harder to compare with the

results produced in this dissertation. For instance a distinction between crack images and non-

crack images is an evaluation metric used in [17] and [12]. In [26] two supervised techniques

are used to classify the crack type, making no sense to compare the results. Other papers like

[20] and [21] only state qualitative results which makes very difficult to establish a comparison.

However, other papers provide statistical results about crack detection, being possible to

compare the results achieved. In [10] the best f-measure achieved was 93.8% (with a recall of

96.3% and a precision of 86.9%). A recall of 93.96% and a precision of 90.70%, producing a f-

measure of 91.95% is stated in [11] as the best result. However, the same paper achieves a

lower f-measure but with a recall of 96.75%, emphasizing more the comparison between recall

than the f-measure metric. A supervised parametric algorithm obtained the best result in [6] with

a f-measure of 94.7% and a recall of 97% (also the highest obtained in the paper). In [5] the

best recall was 95.44% for the first database and 85.44% for the second database using the

minimum intensity pixel (mip) as feature in both cases.

The best recall achieved in Road1 and Road2 was 98.85% and 94.24%, respectively. The

Road1 recall clearly outperforms the recall obtained in the literature, but the same does not

happen for the second database despite presenting a good result. In terms of the evaluation

metric f-measure the Road1 results can compete with the ones achieved in literature while the

Road2 results are far from it due to a very low precision value. Therefore the results achieved in

both databases can compete with the ones presented in the literature, in terms of recall, while

only the Road1 f-measure results can match with the respective literature results.

The best joint results achieved for the first database, produce a recall of 99.04%, a precision of

95.86% and a f-measure of 94.53%, surpassing the previous Road1 results.

The comparison between the literature results and the best system results is shown in Table 7.

54

Table 7 – Comparison of the literature results with the results achieved in the developed

system.

Evaluation

metrics

Literature results (%) Best system results (%)

[10] [11] [11] [6] [5] [5] Road1 Road2 Joint

results

Recall 96.30 93.96 96.75 97 95.44 85.44 98.85 94.24 99.04

Precision 86.9 90.70 - - - - 89.40 26.99 95.86

f-measure 93.8 91.95 - 94.7 - - 93.09 40.37 94.53

As future work, different batteries of tests, with different configurations of specific pre-

processing for each feature are proposed here, as valid way to improve the system

performance for larger and more challenging databases, such as Road2. New and Interesting

pre-processing techniques to use, expected to provide good results, could be the median filter

or morphological filters [14] to enhance cracks, and anisotropic diffusion filtering [10] to smooth

the image texture variation.

Multi-scale features could also be tested (e.g. wavelets coefficients or Gabor filters), since

cracks have different thickness. These features are typically used in the pixel based approach,

not normally applied in the block based, therefore being unknown their results in this approach.

A selection of a SVM parameter set that improves best the system after tested with several

SVM parameters by trial and error [38] is also proposed as future work. An important parameter

here is the regularization parameter C, since it can save computational time and improve the

system performance. For instance the right adjustment of the regularization parameter C for the

two different classes is computational lower than replicating the number of crack features until

reach the number of non-cracks (the approach used in this dissertation). Other imbalanced data

measure that can bring benefit for the system performance and reduce the computational effort

is the random selection of non-crack features (under-sampling) such that it matches the number

of crack features, being the parameter C equal for both pattern classes.

55

6 References

[1] M. Gavilán, D. Balcones, O. Marcos, D. F. Llorca, M. A. Sotelo, I. Parra, M. Ocaña, P.

Aliseda, P. Yarza and A. Amírola, "Adaptive Road Crack Detection System by Pavement

Classification," Sensors, vol. 11, no. 10, pp. 9628-9657, 2011.

[2] G. Moussa and H. Hussain, "A New Technique for Automatic Detection and Parameters

Estimation of Pavement Crack," in Proceedings of the 4th International Multi-Conference on

Engineering and Technological Innovation, Orlando, Florida, USA, 2011.

[3] JAE, Catálogo de Degradações dos Pavimentos Rodoviários Flexíveis, 1997.

[4] J. Lin and Y. Liu, "Potholes Detection Based on SVM in the Pavement Distress Image," in

Proceedings of the 9th International Symposium on Distributed Computing and Applications

to Business, Engineering and Science, Hong Kong, China, 2010.

[5] P. Rosa and P. Correia, "Automatic Road Pavement Crack Detection Using Boosting

Classifiers," in Proceedings of the European Signal Processing Conference - EUSIPCO,

Glasgow, United Kingdom, 2009.

[6] H. Oliveira and P. L. Correia, "Supervised Crack Detection and Classification in Images of

Road Pavement Flexible Surfaces," in Recent Advances in Signal Processing, Austria, In-

Tech, 2009.

[7] L. Jing and Z. Aiqin, "Pavement crack distress detection based on image analysis," in

Proceedings of the 2010 International Conference on Machine Vision and Human-machine

Interface, Kaifeng, China, 2010.

[8] T. Nguyen, M. Avila and S. Begot, "Automatic detection and classification of defect on road

pavement using anisotropy measure," in Proceedings of the 17th European Signal

Processing Conference, Glasgow, Scotland, 2009.

[9] H. Oliveira and P. L. Correia, "Automatic Road Crack Segmentation using entropy and

image dynamic thresholding," in Proceedings of the European Signal Processing

Conference - EUSIPCO, Glasgow, United Kingdom, 2009.

[10] H. Oliveira and P. L. Correia, "Automatic Crack Detection on Road Imagery using

Anisotropic Diffusion and Region Linkage," in Proceedings of the European Signal

Processing Conference - EUSIPCO, Aalborg, Denmark, 2010.

[11] H. Oliveira and P. L. Correia, "Evaluation of Pre-processing in Road Pavement Image

Analysis," in Proceedings of the Conference on Telecommunications - ConfTele, Santa

Maria da Feira, Portugal, 2009.

[12] N. Sy, M. Avila, S. Begot and J. Bardet, "Detection of Defects in Road Surface by a Vision

System," in Proceedings of the 14th IEEE Mediterranean Electrotechnical Conference,

Ajaccio, France, 2008.

56

[13] S. Chambon and Jean-MarcMoliard, "Automatic Road Pavement Assessment with Image

Processing: Review and Comparison," International Journal of Geophysics, vol. 2011, p.

20, 2011.

[14] G. Bao, "Road Distress Analysis using 2D and 3D Information," to obtain the master degree

of science in Electrical Engineering, University of Toledo, 2010.

[15] B. Augereau, B. Tremblais, M. Khoudeir and V. Legeay, "A differential approach for fissures

detection on road surface images," in Proceedings of the 5th International Conference on

Quality Control by Artificial Vision, Le Creusot, France , 2001.

[16] Q. Zou, Y. Cao, Q. Li, Q. Mao and S. Wang, "CrackTree: Automatic crack detection from

pavement images," Pattern Recognition Letters, vol. 33, no. 3, pp. 227-238 , 2011.

[17] N. Tanaka and K. Uematsu, "A crack detection method in road surface images using

morphology," in Proceedings of the Workshop on Machine Vision Applications, Makuhari,

Chiba, Japan, 1998.

[18] H. Cheng, J. Wang, Y. Hu, C. Glazier, X. Shi and X. Chen, "Novel approach to pavement

cracking detection based on neural network," Transportation Research Record, vol. 1764,

pp. 119-127, 2001.

[19] A. Ayenu-Prah and N. Attoh-Okine, "Evaluating pavement cracks with bidimensional

empirical mode decomposition," EURASIP Journal on Advances in Signal Processing, vol.

2008, 2008.

[20] P. Subirats, O. Fabre, J. Dumoulin, V. Legeay and D. Barba, "A combined wavelet-based

image processing method for emergent crack detection on pavement surface images," in

Proceedings of the 12th European Signal Processing Conference EUSIPCO, Vienna,

Austria, 2004.

[21] P. Subirats, J.Dumoulin, V. Legeay and D. Barba, "Automation of pavement surface crack

detection with a matched filtering to define the mother wavelet function used," in

Proceedings of the 14th European Signal Processing Conference (EUSIPCO), Florence,

Italy, 2006.

[22] S. Zhibiao and G. Yanqing, "Algorithm on contourlet domain in detection of road cracks for

pavement images," in Proceedings of the 9th International Symposium on Distributed

Computing and Applications to Business, Engineering and Science, Hong Kong, 2010.

[23] C.-X. Ma, C.-X. Zhao and Y.-K. Hou, "Pavement distress detection based on

nonsubsampled contourlet transform," in Proceedings of the International Conference on

Computer Science and Software Engineering,, Wuhan, China, 2008.

[24] R. Medina, J. Gómez-García-Bermejo and E. Zalama, "Automated Visual Inspection of

Road Surface Cracks," in Proceedings of the 27th International Symposium on Automation

and Robotics in Construction, Bratislava, Slovakia, 2010.

[25] P. Subirats, J. Dumoulin, V. Legeay and D. Barba, "Automation of pavement surface crack

57

detection using the continuous wavelet transform," in Proceedings of the IEEE international

Conference on Image Processing, Atlanta, GA, 2006.

[26] N. Li, X. Hou, X. Yang and Y. Dong, "Automation Recognition of Pavement Surface

Distress Based on Support Vector Machine," in Proceedings of the 2009 Second

International Conference on Intelligent Networks and Intelligent Systems , Tianjian, China ,

2009.

[27] Y. Huang and B. Xu, "Automatic inspection of pavement cracking distress," Journal

Electronic Imaging, vol. 15, p. 013017, 2006.

[28] C. Jiang-wei, C. Xiu-min, W. Rong-ben and S. Suming, "Research on Asphalt Pavement

Surface Distress Image Feature Extraction Method," Journal of image and grahics, Vols.

8(A), No.10, pp. 1211-1217, 2003.

[29] B. J. Lee and H. D. Lee, "A robust position invariant artificial neural network for digital

pavement crack analysis," in Proceedings of the TRB Annual Meeting, Washington, DC,

USA, 2003.

[30] N. Otsu, "A Threshold Selection Method from Gray-Level Histograms," IEEE Transactions

on Systems, Man and Cybernetics, vol. 9, pp. 62--66, 1979.

[31] H. Elbehiery, A. Hefnawy and M. Elewa, "Surface Defects Detection for Ceramic Tiles

Using Image Processing and Morphological Techniques," in Proceedings of the World

Academy of Science, Engineering and Technology, 2005.

[32] J. Chou, W. O’iNeill and H. Cheng, "Pavement distress evaluation using fuzzy logic and

moments invariants," Transportation Research Record, pp. 39-46, 1995.

[33] H. Lin, J.-W. Zhao, Q.-S. Chen, J.-R. Cai and P. Zhou, "Eggshell Crack Detection Based on

Acoustic Impulse Response and Supervised Pattern Recognition," Czech Journal of Food

Sciences - UZEI, vol. 27, pp. 393-402, 2009.

[34] M. Nieniewski, L. Chmielewski, A. Józwik and M. Sklodowski, "Morphological Detection and

Feature based Classification of Cracked regions in Ferrites," Machine Graphics and Vision,

vol. 8, pp. 699-712, 1999.

[35] A. Ramdas, "Bootstrapping,Adaboosting, Uncertainty Sampling for Genre Classification of

Fine Art Paintings," 10 december 2011.

[36] H. Furuta and H. Hattori, "Damage Assessment of Reinforced Concrete Bridge Decks using

Adaboost," in 3rd International ASRANet Colloquium, Glasgow, UK, 2006.

[37] "Wikipedia," [Online]. Available: http://en.wikipedia.org/wiki/Support_vector_machine.

[Accessed 21 8 2012].

[38] C.-W. Hsu, C.-C. Chang and C.-J. Lin, "A Practical Guide to Support Vector Classification,"

15 4 2010. [Online]. Available: http://www.csie.ntu.edu.tw/~cjlin/libsvm/. [Accessed 2012].

[39] I. Guyon and A. Elisseeff, "An Introduction to Variable and Feature Selection," The Journal

58

of Machine Learning Research, vol. 3, pp. 1157-1182 , 2003.

[40] B. X. Wang and N. Japkowicz, "Boosting Support Vector Machines for Imbalanced Data

Sets," in Proceedings of the 17th international conference on Foundations of intelligent

systems , Springer-Verlag Berlin, Heidelberg, 2008.

[41] D.-q. Zhang, S.-r. Qu, W.-b. Li and L. He, "Image Enhancement Algorithm on Ridgelet

Domain in Detection of Road Cracks," China Journal of Highway and Transport, vol. 22, no.

2, pp. 26-31, 2009.

[42] H. N. Koutsopoulos and A. B. Downey, " Primitive-based classification of pavement

cracking images," Journal of Transportation Engineering, Vols. 119, no. 3, p. 402–418,

1993.

[43] L. Gang, H. Yu-yao and Z. Yan, "Automatic Recognition Algorithm of Pavement Defect

Image Based on OTSU and Maximizing Mutual Information," Microelectronics & Computer,

Vols. 26, No .7, pp. 241-247, 2009.

[44] W. Xiao and ZhangXue, " A New Method for Distress Automation Recognition of Pavement

Surface Based on Density Factor and Image Processing," Journal of Transportation

Engineering and Information,, pp. 82-89, 2004.

[45] J. Zhou, P. Huang and F.-P. Chiang, "Wavelet-Based Pavement Distress Classification,"

Transportation Research Record: Journal of the Transportation Research Board , pp. 89-

98, 2005.

[46] N. Nishimura and S. Kobayashi, "A boundary integral equation for an inverse problem

related to crack detection," International Journal for NumericalMethods in Engineering,

Vols. 32, no. 7, p. 1371–1387, 1991.

[47] M. Kaseko, Z. Lo and S. G. Ritchie, "Comparison of Traditional and Neural Classifiers for

Pavement Crack Detection," Journal of Transportation Engineering, vol. 120, p. 552–569. ,

1994.

[48] F. M. Nejad and H. Zakeri, "An optimum feature extraction method based on Wavelet–

Radon Transform and Dynamic Neural Network for pavement distress classification,"

Expert Systems with Applications, vol. 38, no. 8, pp. 9442-9460 , 2011.

[49] J. Shirataki and T.Tomikawa, "A study of road crack detection by image processing,"

Research Reports of Kanagawa Institute of Technology, vol. 24, pp. 67-71, 2000.

[50] H. Lee and H. Oshima, "New crack-imaging procedure using spatial autocorrelation

function," Journal of Transportation Engineering, Vols. 120, no. 2, p. 206–228, 1994.

[51] E. Teomete, V. R. Amin, H. Ceylan and O. Smadi, "Digital image processing for pavement

distress analyses," in Proceedings of the Mid-Continent Transportation Research

Symposium, Ames, Iowa, 2005.

[52] M. N. Do and M. Vetterli, "Contourlets: A Directional Multi resolution Image

59

Representation," in Proceedings of the International Conference on Image Processing,

2002.

automatic road pavement crack detection using …...automatic road pavement crack detection using...

Documents