research article automatic classification of specific...

18
Research Article Automatic Classification of Specific Melanocytic Lesions Using Artificial Intelligence Joanna Jaworek-Korjakowska and PaweB KBeczek Department of Automatics and Biomedical Engineering, AGH University of Science and Technology, Aleja Mickiewicza 30, 30-059 Krakow, Poland Correspondence should be addressed to Joanna Jaworek-Korjakowska; [email protected] Received 20 November 2015; Revised 23 December 2015; Accepted 24 December 2015 Academic Editor: Yudong Cai Copyright © 2016 J. Jaworek-Korjakowska and P. Kłeczek. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Background. Given its propensity to metastasize, and lack of effective therapies for most patients with advanced disease, early detection of melanoma is a clinical imperative. Different computer-aided diagnosis (CAD) systems have been proposed to increase the specificity and sensitivity of melanoma detection. Although such computer programs are developed for different diagnostic algorithms, to the best of our knowledge, a system to classify different melanocytic lesions has not been proposed yet. Method. In this research we present a new approach to the classification of melanocytic lesions. is work is focused not only on categorization of skin lesions as benign or malignant but also on specifying the exact type of a skin lesion including melanoma, Clark nevus, Spitz/Reed nevus, and blue nevus. e proposed automatic algorithm contains the following steps: image enhancement, lesion segmentation, feature extraction, and selection as well as classification. Results. e algorithm has been tested on 300 dermoscopic images and achieved accuracy of 92% indicating that the proposed approach classified most of the melanocytic lesions correctly. Conclusions. A proposed system can not only help to precisely diagnose the type of the skin mole but also decrease the amount of biopsies and reduce the morbidity related to skin lesion excision. 1. Introduction Body organs are not all internal like the heart, brain, or liver. ere is one we wear on the outside which protects us from extremes of temperature, damaging sunlight, and harmful chemicals. Skin is our largest organ which in adolescence covers about 2 square meters and weights 3.6 kilograms [1]. Like in other parts of the body, skin cells can also grow abnormally, causing cancerous tumors to form. e most popular and widely used method to analyze skin changes is to observe them under a skin surface micro- scope. e analysis of skin with a microscope started in 1663 by Kolhauser who observed vessels of nail matrix. In 1958, aſter many years of research, the first portable dermatoscope has been produced [2]. Goldman was the first dermatologist to coin the term “dermascopy” and to use the dermatoscope to evaluate pigmented cutaneous lesions. Nowadays the examination of the small moles is possible through a digital epiluminescence microscopy (ELM, also dermoscopy or dermatoscopy) which is a noninvasive, in vivo technique that, by employing the optical phenomenon of oil immer- sion, makes subsurface structures of the skin accessible for examination in an optic magnification between 10 to 40 times [2]. Figure 1 presents comparison between the skin mole observed with the naked eye and with a dermoscope. Dermoscopy enables clinicians to observe global and local structures very precisely and thus provides the additional criteria for the clinical diagnosis of pigmented skin lesion. is paper is organized in four sections as follows. Section 1 covers background information on the nature of skin cancer, presents the clinical definition of different mel- anocytic lesions, and describes the motivation and state of the art. Section 2 specifies melanocytic lesions classification algorithm, including the following steps: preprocessing, seg- mentation, feature extraction, feature selection, and classifi- cation method. In Section 3, the conducted tests and results are described. Section 4 closes the paper and highlights future directions. Hindawi Publishing Corporation BioMed Research International Volume 2016, Article ID 8934242, 17 pages http://dx.doi.org/10.1155/2016/8934242

Upload: others

Post on 06-Sep-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Research Article Automatic Classification of Specific ...downloads.hindawi.com/journals/bmri/2016/8934242.pdfmost popular melanocytic nevi to the diagnostic categories including Clark

Research ArticleAutomatic Classification of Specific Melanocytic Lesions UsingArtificial Intelligence

Joanna Jaworek-Korjakowska and PaweB KBeczek

Department of Automatics and Biomedical Engineering AGH University of Science and Technology Aleja Mickiewicza 3030-059 Krakow Poland

Correspondence should be addressed to Joanna Jaworek-Korjakowska jaworekaghedupl

Received 20 November 2015 Revised 23 December 2015 Accepted 24 December 2015

Academic Editor Yudong Cai

Copyright copy 2016 J Jaworek-Korjakowska and P Kłeczek This is an open access article distributed under the Creative CommonsAttribution License which permits unrestricted use distribution and reproduction in any medium provided the original work isproperly cited

Background Given its propensity to metastasize and lack of effective therapies for most patients with advanced disease earlydetection of melanoma is a clinical imperative Different computer-aided diagnosis (CAD) systems have been proposed to increasethe specificity and sensitivity of melanoma detection Although such computer programs are developed for different diagnosticalgorithms to the best of our knowledge a system to classify different melanocytic lesions has not been proposed yet Method Inthis research we present a new approach to the classification of melanocytic lesionsThis work is focused not only on categorizationof skin lesions as benign or malignant but also on specifying the exact type of a skin lesion including melanoma Clark nevusSpitzReed nevus and blue nevus The proposed automatic algorithm contains the following steps image enhancement lesionsegmentation feature extraction and selection as well as classification Results The algorithm has been tested on 300 dermoscopicimages and achieved accuracy of 92 indicating that the proposed approach classified most of the melanocytic lesions correctlyConclusions A proposed system can not only help to precisely diagnose the type of the skin mole but also decrease the amount ofbiopsies and reduce the morbidity related to skin lesion excision

1 Introduction

Body organs are not all internal like the heart brain or liverThere is one we wear on the outside which protects us fromextremes of temperature damaging sunlight and harmfulchemicals Skin is our largest organ which in adolescencecovers about 2 square meters and weights 36 kilograms[1] Like in other parts of the body skin cells can alsogrow abnormally causing cancerous tumors to form Themost popular and widely used method to analyze skinchanges is to observe them under a skin surface micro-scope

The analysis of skin with a microscope started in 1663by Kolhauser who observed vessels of nail matrix In 1958after many years of research the first portable dermatoscopehas been produced [2] Goldman was the first dermatologistto coin the term ldquodermascopyrdquo and to use the dermatoscopeto evaluate pigmented cutaneous lesions Nowadays theexamination of the small moles is possible through a digitalepiluminescence microscopy (ELM also dermoscopy or

dermatoscopy) which is a noninvasive in vivo techniquethat by employing the optical phenomenon of oil immer-sion makes subsurface structures of the skin accessible forexamination in an optic magnification between 10 to 40times [2] Figure 1 presents comparison between the skinmole observed with the naked eye and with a dermoscopeDermoscopy enables clinicians to observe global and localstructures very precisely and thus provides the additionalcriteria for the clinical diagnosis of pigmented skin lesion

This paper is organized in four sections as followsSection 1 covers background information on the nature ofskin cancer presents the clinical definition of different mel-anocytic lesions and describes the motivation and state ofthe art Section 2 specifies melanocytic lesions classificationalgorithm including the following steps preprocessing seg-mentation feature extraction feature selection and classifi-cation method In Section 3 the conducted tests and resultsare described Section 4 closes the paper and highlights futuredirections

Hindawi Publishing CorporationBioMed Research InternationalVolume 2016 Article ID 8934242 17 pageshttpdxdoiorg10115520168934242

2 BioMed Research International

Optical lense

Polarizing filter

Immersion medium

Epidermis

Dermoepidermal junction

Dermis

(a) (b)

Dermoscopy

(c)

Figure 1 (a) Specific optical system for the pigmented skin lesion examination The mole is typically covered with a liquid (usually oil oralcohol) (b) Dermoscopy is performed with a handheld instrument called a dermatoscope (c) Dermoscopy enables clinicians to observeborder irregularities colors and structures within skin lesions that are otherwise not visible to the unaided eye [2 4]

Background Although the two most commonly diagnosedskin cancers are basal cell carcinoma and squamous cellcarcinoma which develop from the nonpigmented cellsof the skin the most aggressive and dangerous is malig-nant melanoma Malignant melanoma (Latin melanomamalignum) originates in pigment producing cells calledmelanocytes and is less common but far more deadlythan cancers mentioned above [3] Melanomas are fast-growing and highly malignant tumors often spreading tonearby lymph nodes lungs and brain (Figure 2) Malignantmelanoma is likely to become one of the most commonmalignant tumors in the future with even a ten times higherincidence rate [2]

Due to the high skin cancer incidence and mortalityrates early diagnosis of melanoma has become an extremelyimportant issue The progress is visible not only in the pri-mary research but also in the development of sophisticatedmore accurate methods of image processing classificationand computer-aided diagnosis [4]

In this paper we present a new approach to the classifi-cation of melanocytic lesions Most of the research done sofar in this area has concentrated on creating new methodsto distinguish benign from malignant skin lesions In ourresearch we go one step further and differentiate melanocyticlesions including Clark nevus Spitz nevus blue nevus andmalignant melanoma In the next part of this paper we willpresent the clinical importance and motivation to undertakethis research

Clinical Importance and Motivation A melanocytic nevus(also called a nevocytic nevus or a mole) is a lesionthat contains nevus cells (a type of pigment cells calledmelanocytes) A mole can be either subdermal (under theskin) or a pigmented growth on the skin formed mostlyof melanocytes The high concentration of the bodyrsquos pig-menting agent melanin is responsible for their dark colorAlthough melanocytic nevi are very common their histoge-nesis is not well understood and still a matter of debate [4]All we know about the life of melanocytic nevi is based oncross section or cohort studies because it is still complicatedtomonitor skin lesions in vivo on a cellular levelThemajorityof moles appear during the first two decades of a personrsquoslife Acquired moles are a form of benign neoplasms whilecongenital moles (or congenital nevi) are considered a minormalformation or hamartoma and may be at a higher risk ofmelanoma [6]

In our research we concentrate on the classification of themost popular melanocytic nevi to the diagnostic categoriesincluding Clark nevus Blue nevus SpitzReed nevus andmalignant melanoma It is of high importance to diagnosecorrectly the type of a skin lesion because the furtherphysiciansrsquo orders depend on it and some of the nevi havea higher risk of developing malignant melanoma than otherones The proposed system can not only help to preciselydiagnose the type of the skin mole but also decrease theamount of biopsies and reduce the morbidity related to skinlesion excision

BioMed Research International 3

Normal skin

Epidermis

Dermis

Subcutaneoustissue

Stage 0Tumor confined

to epidermis

Stage 1Tumor lt 1mm

without

ulceration

Stage 2Tumor 1ndash4mmwith ulceration

Stage 3Spread to

lymph nodes

Lymph node

Stage 4Spread to other

organs

Skin Liver Lungs

Figure 2 Comparison between healthy skin and skin affected by malignant melanoma Presentation of five stages in malignant melanomaevolution process [5]

Figure 3 Reticular and globular type of Clark nevus based on [2]

Clark Nevus Clark nevi have been named in 1978 afterWallace H Clark Jr who described this particular type ofnevus by studying numerous melanocytic nevi in patientswith concomitant melanomas [2] Clark nevi are the mostcommon nevi in man and while these moles are neithercontagious nor dangerous medical experts believe that Clarknevi do present a higher risk of turning into melanoma whencompared to more common moles Clinical dermoscopicand histopathologic variants of Clark nevi are protean andthe differentiation of Clark nevi from melanoma in situand early invasive melanomas is the major challenge in therealm of pigmented skin lesions [2] Clinically Clark neviare flat to elevated or even slightly papillated pigmentedlesions characterized by various shades of brown colorationand situated on the trunk and extremities that are usuallycalled common junctional nevi or common compound nevi

(Figure 3) Those who have a number of Clark nevi shouldpursue a complete skin examination every year

Blue Nevus According to the original definition by Tiecheblue nevus is a dermal-based benign melanocytic lesionhistopathologically made up by variable proportions ofovalspindle and bipolar usually heavily pigmented dendriticcells [2 4 6] Clinically blue nevi appear as relatively regularsharply circumscribed with a uniform blue to gray-blue orsometimes even gray-black pigmentation Dermoscopicallyblue nevi exhibit a homogeneous pattern with completeabsence of local features such as pigment network structuresbrown globules or black dots within the diffuse homogenouspigmentation (Figure 4) This absence of local features andthe presence of a well-defined border usually without streaksare criteria to differentiate blue nevus from melanoma in

4 BioMed Research International

Figure 4 Examples of blue nevus based on [2]

Figure 5 Examples of SpitzReed nevus based on [2]

many cases with a high degree of certainty As blue nevi areharmless no treatment is needed

SpitzReed Nevus SpitzReed nevi are typically small well-circumscribed reddish papules larger reddish plaques butalso verrucous plaques with variegated colors and may beup to one or two centimeters in diameter Dermoscopicallyabout 50 of pigmented Spitz nevi show a symmetricappearance and a characteristic starburst pattern or a globularpattern with a regular discrete blue pigmentation in thecenter and a characteristic rim of large brown globules at theperiphery (Figure 5) Spitz nevi are well-known simulatorsof cutaneous melanoma from a clinical dermoscopic andhistopathologic point of view [2]

Malignant Melanoma Malignant melanoma is the mostaggressive type of skin cancer It is due to the uncontrolledgrowth of melanocytes The first sign of melanoma is usuallyan unusual looking mole or spot Melanomamay be detectedat an early stage when melanocytic lesions are only a fewmillimeters in diameter but they also may grow up to severalcentimeters in diameter before being diagnosed

Clinically a melanoma can have a variety of colorsincluding white brown black blue red or even light greyMelanomas in early stage are usually small more or lessirregularly shaped and outlined macules or slightly elevatedplaques with pigmentation that varies from pink to dark

brown or black Invasive melanomas are as a rule papular ornodular often ulcerated and characteristically exhibit shadesof brown and black but also foci of red white or blue col-oration Dermoscopic features describing melanoma containmulticomponent pattern irregular dotsglobules atypicalpigment network irregular streaks irregular pigmentationregression structures and blue whitish veil (Figure 6) Asuspected melanoma should be surgically removed with a2-3mm margin (excision biopsy) and sent to a pathologylaboratory for a microscopic examination [2]

Related Works During patient examinations researchersobserve that young inexperienced dermatologists and fam-ily physicians have huge difficulties in the correct visualassessment of skin lesions As stated in [7 8] only expertshave arrived at 90 sensitivity and 59 specificity in skinlesion diagnosis while for unexperienced physicians thesefigures show a significant drop till around 62-63 for generalpractitioners [9] Due to these obstacles researchers tryto implement and build computer-aided diagnosis (CAD)systems for automated diagnosis of melanoma to increasethe specificity and sensitivity and to simplify the assessmentof skin moles Two reviews on the state of the art of CADsystems for skin lesion diagnosis can be found in [3 10]For these systems the true positive rate ranges between08 and 10 and the true negative rate between 05 and095 Although such computer programs are developed for

BioMed Research International 5

Figure 6 Examples of malignant melanoma based on [2]

Single lesionanalysis

Preprocessing Segmentation

(i) Black frame removal(ii) Hair detection

(iii) Hair inpaintingFeature

extraction(i) Geometrical feature

(ii) Color-based feature(iii) Texture description(iv) Asymmetry

Feature selectionclassification

(i) Black frame removal(ii) Hair detection

(iii) Hair inpaintingFeature

extraction

Figure 7 The proposed algorithm for the classification of melanocytic lesions based on dermoscopic color images

Table 1 A categorization of feature descriptors commonly used inthe computerized analysis of dermoscopic images

Clinicalfeatures Feature descriptors References

Asymmetry Symmetry distance [37]Lesionrsquos centroid [38]

Borderirregularity

Fourier feature [39]Fractal geometry [40]Area and perimeter [38 41 42]Irregularity index [43 44]

Colorvariegation RGB statistical descriptors [38 45]

Diameter Semimajor axis of the ellipse [38]

Otherfeatures

Pattern analysis [38 46 47]Wavelet-based descriptors [48]Texture descriptors [49]Intensity distribution descriptors [5]Haralick descriptors [50]

different diagnostic algorithms to the best of our knowledgea CAD system to classify differentmelanocytic lesions has notyet been proposed Based on the review article [3] in Table 1we present an extended categorization of feature descriptorswhich are commonly used in the computerized analysis ofdermoscopic images

2 Materials and Methods

The designed system for the automated diagnosis of melan-ocytic lesions is a computer-aided diagnosis system which isdesigned to reproduce the decision of dermatologist based onthe dermoscopy images The proposed methodology of dis-crimination between malignant melanoma and nevi tumorsis shown in Figure 7The automated system is divided into sixmain stages including preprocessing (image enhancement)segmentation feature extraction feature selection classifica-tion and then evaluationThe system enables texture analysiswithout being limited by selection and detection of structureof interest

In this section the preprocessing and segmentation stepsare described shortly based on our previous work whilethe main stage which is feature extraction selection andclassification is presented in detail The application has beenimplemented using Matlab ver 2013

21 Dermoscopic Image Preprocessing The main goal ofthe preprocessing step is to improve the image quality byreducing or even removing the unrelated and surplus parts inthe dermoscopic imagesDermoscopic images are inhomoge-neous and complex For dermoscopy images the preprocess-ing step is obligatory because of extraneous artifacts such asskin lines air bubbles and hairs which appear in virtuallyevery image

To enhance the image and to reduce the influence oflight hairs air bubble small pores shines and reflections amedian filter is being used

6 BioMed Research International

(a) (b)

(c) (d)

(e) (f)

Figure 8 Outcome of the preprocessing step (a) input image (b) black frame removal (c) grayscale conversion (d) top-hat transform andbinarization process (e) hair distinction from other structures and (f) inpainting

Among the most necessary artifact rejection steps is hairremoval because hairs may cover parts of the image andmake the segmentation and texture analysis impossible Anumber of methods have been developed for hair removalin dermoscopic images and they were mostly based onmorphological operations and adaptive thresholding [11ndash13]A good approach for hair removal is the use of top-hattransformationThe process consists of four steps convertingRGB to grayscale image applying black top-hat transforma-tion distinguishing hairs from other local structures andinpainting

The dermoscopic RGB image is being converted intograyscale with the NTSC 1953 standard (Figure 8(c)) Sec-ondly the black top-hat transform which is a morphologicalimage processing technique is used to detect thick darkhairs The result of this step is the difference between theclosing operation and the input image

119879119908(119868) = 119868 ∘ 119887 minus 119868 (1)

where ∘ denotes the closing operation 119868 is the grayscale inputimage and 119887 is a grayscale structuring element [14 15]

The black top-hat transform returns an image containingelements that are darker than their surrounding and smallerthan the structuring element

These steps have been precisely described in our work [1415] Figure 8 presents the outcome after each step

22 Segmentation Algorithm The techniques and algorithmsavailable for segmentation of medical structures are specificto application imaging modality and type of body part to bestudied For the dermoscopic images segmentation processis one of the most challenging and crucial processes Thisprocess for dermoscopic images is extremely difficult dueto several factors low contrast between the healthy skinand moles variegate coloring inside lesions and irregularborders as well as different artifacts Due to the difficultiesdescribed above numerousmethods have been implementedand tested [3] Celebi et al present in their research [16]

BioMed Research International 7

Health skin segmentation

(a) (b) (c)

(d) (e)

Figure 9 Results of the segmentation process (andashd) segmentation of the healthy skin with the region-growing algorithm (e) segmentedarea

the state of the art of segmentation methods and comparethem with the statistical region merging as a recent colorimage segmentation technique based on region growingand merging Based on the achieved results and conductedexperiments the skin lesion extraction is performed byseeded region-growing algorithm [17] in regard to twoaspects Firstly during the preprocessing step the healthyskin becomes homogeneous Secondly the whole skinmole isvisible in the dermoscopic image and surrounded by healthyskin It means that the healthy skin surrounds the moleRegion-growing techniques generally give better results innoisy images where edges are extremely difficult to detect

For the skin lesion segmentation we take one seed whichis located in the left upper corner of the image (Figure 9(a))The region is iteratively grown by comparing all unallocatedneighboring pixels to the regionThe region-growing processconsists of picking a seed from the set investigating all4-connected neighbors of this seed and merging suitableneighbors to the seed The seed is then removed and allmerged neighbors are added to the seed set The region-growing process continues until the seed set is empty

These steps have been precisely described in our work[14] In Figure 9 we present the results of the segmentationstep for several iterations of the region-growing algorithm

23 Geometrical Feature Extraction Geometrical featureshave been used mainly to describe lesionrsquos outline as itsirregularity usually indicates malignancy Those features arebased mostly on such properties of an object as areadiameters or geometric moments To ensure robustness ofthe selected features their formulation does not dependdirectly on objectrsquos perimeter (as in case of raster images itis hard to accurately estimate this quantity) All features arenormalized to ensure their scale- rotation- and translation-invariance

To assess lesionrsquos shape the following features havebeen extracted maximal diameter (maximum distancebetween two arbitrary points) equivalent diameter (119863eq =

radic(4120587) sdot Area) variance of the radial distance distributionrectangularity (ratio of area of an object to area of itsbounding box) elongation (aspect ratio of objectrsquos boundingbox) eccentricity Haralickrsquos compactness and normalizeddiscrete compactness [18ndash21]

Variance of the radial distance distribution is given by [18]

119904119889

2

=1

card (119861)sum119901isin119861

(119889 (119901 119862) minus 119889 (119901 119862))2

119889 (119901 119862)2

(2)

where 119861 is a set of all pixels constituting perimeter of object119874 119862 is geometric centroid of 119874 and 119889(119886 119887) is the distancebetween pixels 119886 and 119887 in a chosen metric As studied lesionsare of circular shape the Euclidean metric has been chosen

Eccentricity measures deviation of a conic curve from acircle and (in mathematics) is formulated as ratio of distancebetween foci of an ellipse to the length of its major axisIn image processing eccentricity of an object 119874 is actuallyvalue of eccentricity of an ellipse with same second momentsas object 119874 [19]

120576 =(11989802minus 11989820)2

+ 411989811

(11989802+ 11989820)2

(3)

where 119898119901119902

denotes an image moment of (119901 119902) order How-ever such a formulation of eccentricity still serves the pur-pose of measuring circularity of an object

Haralickrsquos compactness is a measure for circularity of adigital figure defined as 120583

119877120590119877 where 119877 is a random variable

of the distance between the center of the figure to any part ofits perimeter [20]

8 BioMed Research International

Normalized discrete compactness 119862119863119873

measure is basedon counting the number of cell sides common to adjacentpixels of objectrsquos perimeter a measure called discrete com-pactness 119862

119863[21]

119862119863119873

=119862119863minus 119862119863min

119862119863max minus 119862

119863

=119875 minus 2119899 + 2

119875 + 4radic119899

119862119863=4119899 minus 119875

2 119862119863min = 119899 minus 1 119862

119863max =4119899 minus 4radic119899

2

(4)

where119862119863min119862119863max are respectively lower and upper bound

of discrete compactness of a shape composed of 119899 pixels and119875 is the perimeter of the digital region [21]

24 Color-Based Features The analysis of lesionrsquos colors is animportant source of information when determining lesionrsquostype as malignant lesions are characterized by a rich texture[22]

The following color-based features have been extractednumber of colors present within lesionrsquos area (together withinformation about the presence of two specific colors whiteand black) concentricity centroid distance and 119871

lowast

119886lowast

119887lowast

histogram distances [23 24]To determine the number of colors and to calculate the

concentricity an image has been converted to CIE 119871lowast

119886lowast

119887lowast

color space and then 119886lowast and 119887

lowast color channels have beenclustered into four clusters Three clustering algorithms havebeen tested 119896-means clustering kernel 119896-means clusteringand hierarchical agglomerative clustering [25 26] Kernel119896-means algorithm has been tested using Gaussian kernelfor 120590 isin 1 2 [26] In case of hierarchical agglomerativeclustering Wardrsquos minimum variance method based onEuclidean distance between clusters has been applied as thecriterion for choosing the pair of clusters to merge at eachstep [27]

The number of colors has been related to the greatestdistance between clustersrsquo centroids (each pair of clusters hadbeen considered) and it has been assumed that to identify twocolors as significantly different the distance between them in119886lowast

119887lowast space must exceed certain threshold value

119899colors = max(1 lfloor1120591max (119863)rfloor)

119863 = 119889119864(1198881 1198882) 1198881 1198882isin 119862 and 119888

1= 1198882

(5)

where 119862 is a set of clustersrsquo centroids The threshold value 120591has been set to 120591 = 12

It has been assumed that to state the presence of a certaincolor within the lesion the area of biggest cohesive area ofthat color must be greater than 1 of the area of the wholelesion Pixels have been recognized as white if 119871lowast gt 65 and asblack if 119871lowast lt 15 where 119871lowast is the value of luminosity channelfor the given pixel

Based on the aforementioned clusterization a measurecalled ldquoconcentricityrdquo is derived [23] To compute concen-tricity of an object its color clusters are first arranged inascending order regarding the area of their convex hull(ie segment 119878

1has the smallest area of convex hull and

segment 1198784the greatest) and then a few auxiliary indexes are

calculated The 119899(sdot) function stands for the number of pixelsof the given object

For segment 1198781 the percent area index PA is computed

[23]

PA =119899 (119862119862max)

119899 (1198781)

(6)

where 119862119862max is the largest cohesive set of pixels of 1198781 The PAdescribes how well the core is grouped

Let Hull denote the minimal convex area which includesthe entire pixels of 119878

2 The core inclusion CI is then defined

as [23]

CI =119899 (1198781capHull)

119899 (1198781)

(7)

The CI describes how well the second smallest area encirclesthe core

Let Core denote the minimal convex area which includesthe entire pixels of 119878

1 The hull exclusion HE is given by [23]

HE = 1 minus119899 (1198782cap Core)

119899 (1198782)

(8)

TheHEdescribes how exclusively the hull surrounds the coreFinally concentricity is defined as the product of the three

variables [23]

Concentricity = PA times CI timesHE

Concentricity isin [0 1]

(9)

Concentricity closer to one implies a better concentric struc-ture

Images of all regions used to compute concentricity havebeen compiled in Figure 10

The centroid distance for a color channel is defined asthe distance between the geometric centroid (of the binaryobject) and the brightness centroid of that channel Thebrightness centroid may be considered a center of mass ofan object whose density is determined by intensity valuesof its pixels If the pigmentation in a particular channel ishomogeneous the brightness centroid will be close to thegeometric centroid resulting in small value of the centroidaldistance

Color similarity of two regions has been quantified by the1198711- and 119871

2-norm histogram distances for CIE 119871

lowast

119886lowast

119887lowast color

space coarsely quantized into 4 times 8 times 8 bins [24]

1198711(119867119860 119867119861) =

4times8times8

sum

119894=1

1003816100381610038161003816119867119860 (119894) minus 119867119861(119894)1003816100381610038161003816

1198712(119867119860 119867119861) =

4times8times8

sum

119894=1

(119867119860(119894) minus 119867

119861(119894))2

(10)

where119867119860119867119861are histograms of areas 119860 and 119861 respectively

BioMed Research International 9

Object S1 S2 CCmax

Core S1 cap core Hull S2 cap hull

Figure 10 Regions used to compute concentricity of an object

25 Texture Description Quantitative properties of lesionrsquostexture were described with measures based on a gray levelcooccurrence matrix (GLCM) and a gray level run-lengthmatrix (GLRLM) [28 29]

GLCM is a square matrix 119875 where the (119894 119895)th entry of 119875represents information about the frequency of occurrence ofsuch two adjacent pixels where one of them has intensity 119894

and another has intensity 119895 Such informationmay be used toderive second-order statistical measurements characterizingexamined texture GLCM and six measures derived from it(contrast correlation energy homogeneity maximum prob-ability and dissimilarity) have been calculated as describedby Celebi et al [24]

Contrast = sum

119894119895

(119894 minus 119895)2

119875119894119895

Correlation = minussum

119894119895

(119894 minus 120583119909) (119895 minus 120583

119910)

120590119909120590119910

119875119894119895

Energy = sum

119894

sum

119895

[119875119894119895]2

Homogeneity = sum

119894

sum

119895

119875119894119895

1 + (119894 minus 119895)2

Maximum Probability = max119894119895

119875119894119895

Dissimilarity = sum

119894

sum

119895

119875119894119895sdot1003816100381610038161003816119894 minus 119895

1003816100381610038161003816

(11)

With GLRLM it is possible to analyze higher orderstatistical features for the given texture GLRLM is a two-dimensional matrix in which each element 119901(119894 119895 | 120579) gives

the total number of occurrences of runs of length 119895 at graylevel 119894 in a given direction 120579

In our study the direction of runs is irrelevant asdermoscopic camera has no fixed orientation when takingimages of lesionsmdashthere is no ldquoreference orientationrdquo Theonly thing that matters is texturersquos homogeneity thus insteadof calculating measures for individual orientations 120579 isin

0∘

45∘

90∘

135∘

all GLRLM computed for the aforemen-tioned orientations have been added together and mea-sures have been computed only using this new orientation-invariant matrix

GLRLM is used to derive 11measures Five basicmeasures(SRE LRE GLN RLN and RP) describe the distribution ofrunsrsquo lengths (SRE LRE RLN and RP) and runsrsquo intensity(GLN) [29] As SRE and LRE do not consider pixelsrsquo intensityLGRE andHGREmeasures have been proposed [30] FinallySRLGE SRHGE LRLGE and LRHGEmeasures are based onstatistical properties of a joint probability distribution of bothrunsrsquo intensity and length [31]

SRE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

119894119895

1198952

LRE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198952

119894119895

GLN =1

119899119903

119866

sum

119894=1

(

119877

sum

119895=1

119894119895)

2

RLN =1

119899119903

119877

sum

119895=1

(

119866

sum

119894=1

119894119895)

2

10 BioMed Research International

RP =119899119903

119899

LGRE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

119894119895

1198942

HGRE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198942

119894119895

SRLGE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

119894119895

1198942 sdot 1198952

SRHGE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198942

119894119895

1198952

LRLGE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198952

119894119895

1198942

LRHGE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198942

1198952

119894119895

(12)

where 119899 is the number of image pixels with 119866 gray levels 119877 isthe maximal run length and 119899

119903is the total number of runs in

an image (119899119903= sum119866

119894=1sum119877

119895=1119894119895)

26 Asymmetry The observed asymmetry is a significantdiagnostic premise as in malignant lesions the arrangementof local structures (eg dots streaks and pigmentation nets)is nonuniform across the whole area of a lesion [32] Someresearchers point out that for instance sharp transitionsbetween central and border area indicatemalignancy [24 33]

Asymmetry measures may be defined in terms of ratiosof feature values across various lesionrsquos area segments or interms of changes of feature values between halves obtainedby splitting lesion area into halves using straight section linespassing through the center of mass [32 33]

In our study a different approach has been adopted Thelesion area has been first divided into sets of subregions inthree differentmanners (Figure 11) by splitting (1) into centraland border part (2) into halves along minor and major axisand (3) into quarters using same axes Then a variance offeature values has been calculated for each set of subregionsAdditionally asymmetry indices proposed by Stoecker et al[34] and the following quantities have been computed

119877119876119894

=119891119876119894

119891119882

(13)

where 119891119876119894

denotes value of feature 119891 calculated for 119894thquadrant and 119891

119882for the whole area respectively

27 Data Preparation In order to apply correlated-basefeature selection method (the choice of this particularmethod is discussed later) numerical features have beendiscretizedThis is due to the fact that in the studied problem

the decision feature is a categorical variable which meansit takes on discrete values and the relationship betweenvariables of those two classes cannot be directly assessedTheonly solution is to discretize the variable taking continuousvalues In general terms discretization is a simple logicalcondition which takes into account one or a few attributesand which aims at splitting a range of data into at least twosubsets Data in our study have been discretized accordingto the method proposed by Fayyad and Irani [35] whichuses heuristics minimizing entropy based on the ldquominimumdescription length principlerdquo

To ensure correct work of algorithms based on analysis ofdistances between points in the feature space (eg 119896-nearestneighbors algorithm SVM) for each feature its values havebeen scaled to the range of [minus1 1] Consequently the riskof a situation in which a feature with larger range of valuesdominates other features has been eliminated Normallydistributed features have been transformed using 119885-score(Equation (14)) whereas other features were rescaled linearlyto the desired range (15) [36] Consider

x =x minus x3119904

where x =1

119873sum

119894

x119894 119904 =

1

119873 minus 1(sum

119894

(x119894minus 120583)2

)

(14)

x = 2x119894minus xmin

xmax minus xminminus 1 (15)

where x is a vector of featurersquos values prior to normalizationand x is the same vector after normalization To determineif a given feature is well modeled by a normal distributionchi-square goodness-of-fit test has been applied

28 Class Imbalance Problem The data set used in thisstudy exhibits class imbalance problem which means thatthe classes are not approximately equally represented (ie itconsists of more than three times as many images of Clarknevus than images of blue nevus) In such case most clas-sifiers will focus on learning how to identify representativesof majority classes leading to poor predictive accuracy forminority classes When making a diagnosis such a situationis unacceptable as misclassification error may lead even topatientrsquos death

The most common way to address this issue is sampling[51] There exist two main types of sampling methodsundersampling which consists in removing representativesof the majority class and oversampling which consists inadding examples of the minority class Out of many samplingtechniques two methods are particularly popular randomundersampling and synthetic minority oversampling technique(SMOTE) In random undersampling method randomlydrawn representatives of the majority class are removed fromthe data set In SMOTE method new ldquosyntheticrdquo examplesare created by averaging a few representatives of the sameminority classes which are nearest neighbors in the featurespace [52]

BioMed Research International 11

(a) Whole area

(b) Inner and outer area

(c) Split along the major axis (d) Split along the minor axis

(e) Split along both major and minor axis simultaneously

Figure 11 A sample division of an object into subregions

In this study SMOTE method has been used as it hasbeen proven to be more effective than random undersam-pling when tested on a set of dermoscopic images obtainedfrom various sources without any common or rare lesiontypes omitted [24]

29 Feature Selection Feature selection consists in reducingdata dimensionality by rejecting redundant unimportant ornoisy features thus resulting in increased prediction accu-racy less complex classifier models and better computationefficiency

Feature selection algorithms may be divided into threemain categories filters wrappers and projections [53 54] Inthis study filter methods have been used for two reasons[24] Firstly as filters are usually fast it is possible to test

a huge number of combinations of features Secondly if onewould like to use wrappers on a given data set the targetlearning algorithm should demonstrate satisfactory resultsfor the original data set as wrappers are based on feedbackprinciple As some features extracted in this study might beirrelevant or redundant as well as due to class imbalancewrappers have not been likely to fulfill those restrictionsOn the other hand projections are used mainly to increasecomputational performance which was not the case in thisstudy

Out of numerous available filters three areworth noticingdue to their satisfactory performance on various data sets[53] ReliefFmutual information-based feature selection andcorrelation-based feature selection [55ndash57] As numerous fea-tures extracted in this study are strongly mutually correlated

12 BioMed Research International

correlation-based feature selection has been chosen as theonly method which takes into account not only relationshipsbetween features and the decision class but also relationshipsbetween features themselves

The heuristic ldquomeritrdquo Merit119878of a feature subset 119878 contain-

ing 119896 features is given by [58]

Merit119878=

119896119903cf

radic119896 + 119896 (119896 minus 1) 119903ff (16)

where 119903cf is the average feature-class correlation and 119903ff theaverage feature-feature intercorrelation

To compare two feature vectors of discrete values thesymmetric uncertainty an improved version of informationgain measure has been used [59] Symmetric uncertainty isgiven by

SU (119883 119884) = 2 [119867 (119883) + 119867 (119884) minus 119867 (119883 119884)

119867 (119884) + 119867 (119883)] (17)

where 119867(119860) is the marginal entropy of set 119860 and 119867(119860 119861) isthe joint entropy of sets 119860 and 119861

The number of selected features is a parameter whichrequires a careful tuning too few features results may pre-vent classifiers from distinguishing between various classeswhereas too many features impose risk of overfittingmdashasituation when a model excels in classifying training databut fails to generalize knowledge and hence misclassifies newsamples As some researchers suggest limiting the number offeatures 119896 to the range of 5 le 119896 le 30 [24] in this study 119896 = 20

has been adoptedThe applied procedure allowed choosing features which

best discriminate lesion types concentricity computed using119896-means algorithm concentricity computed using kernel 119896-means algorithm for 120590 = 1 119871lowast119886lowast119887lowast histogram distancesin 1198711and 119871

2metrics computed for central and border

part variances from sets of 119871lowast

119886lowast

119887lowast histogram distances

between the whole region and individual quarters in both1198711and 119871

2metrics variances from sets of 119871lowast119886lowast119887lowast histogram

distances between each pair of quarters in both 1198711and 119871

2

metrics variances from sets of GLCM features (dissimilar-ity maximum probability entropy energy correlation andcontrast) computed for quarters variance of variances of119886lowast channel values (in 119871

lowast

119886lowast

119887lowast color space) calculated for

quarters variance of variances of H channel values (in HSVcolor space) calculated for quarters mean value of variancesof 119886lowast channel values (in 119871

lowast

119886lowast

119887lowast color space) for quarters

variance from sets of 119871lowast119886lowast119887lowast histogram distances betweenhalves in both 119871

1and 119871

2metrics and variance from sets of

GLCM dissimilarity computed for halvesThis particular choice of features does not diverge from

the clinical practice according to which asymmetry is one ofthe most important premises in determining the lesion type

210 Classification

2101 Models The following predictive models have beentested [60 61] 119896-nearest neighbors algorithm (for 119896 isin

3 5 10 15) logistic regression decision tree and supportvector machine (SVM) Other models like neural networks

or rule-based systems have not been a subject of this study[62] In case of the decision tree Gini-Simpson index hasbeen used as a measure of data diversity [63] Decisionboundary of SVMs has been determined using radial basisfunction as kernels Radial basis function has been preferredover linear sigmoid and polynomial kernels as it exhibits afew important properties [24] it allows classifying nonlin-early separable data sets (as is the case in this study) it ischaracterized by high numerical stability and it is definedusing only one parameter Moreover as in this study there areas many as four decision classes the multiclass classificationproblem has been decomposed into a (greater) number ofbinary classification problems Winner-takes-all strategy hasbeen applied to combine solutions of subproblems into afinal solution as for small data sets its results are similarto the results of the best strategymdashpairwise couplingmdashwhileat the same time its computational burden is much lower[64 65]

Individual classifiers have been trained using typicalparameters It has been assumed that the cost of misclassify-ing melanoma as nevi is four times higher than another sortof misclassification Initial experimental results have provenSVM to be the most effective classifier (similar observationshave been made by other research teams [60 66]) Con-sequently further experiments were focused on improvingperformance of SVMs by fine-tuning their parameters

SVMwith radial basis function kernel is governed by twoparametersmdashmisclassification cost 119862 and kernel width 120574The task is to select those parameters in such a way that theaccuracy of predictions for new data (data not used in thelearning process) is maximal

As values of only two parameters have had to be adjustedgrid search has been applied [67] For each parametervalues from exponential series have been considered 119862 isin

2minus5

2minus3

215

and 120574 isin 2minus15

2minus13

23

Each com-bination of parameter values (119862

0 1205740) has been assessed

using a validation procedure described below After the gridsearch had finished SVM have been trained using optimalparameter values (119862lowast 120574lowast)

2102 Validation The aim of the validation step is to assessclassifierrsquos quality based on the number of generalizationerrors that is misclassified samples Optimal SVM modelshave been validated using stratified Monte Carlo cross vali-dation method and in each iteration the test set consisted of10 of samples

The Monte Carlo variant of cross validation had beenchosen for a number of reasons Firstly experimental resultsobtained for a similar problem [24] suggest high effectivenessof this method Secondly in their analysis Molinaro et al[68] point out that for datasets consisting of few sampleswith many features such as a dataset used in this studythe difference in effectiveness between Monte Carlo andldquoclassicrdquo 10-fold cross validation is insignificant The numberof samples drawn into the test set has been determined usinga ldquorule of thumbrdquo [69] Finally by applying Monte Carlocross validation one may avoid constructing overdevel-oped predictive models which decreases the risk of overfit-ting

BioMed Research International 13

As in the data set used in this study there is a considerabledifference in the number of representatives between variousdecision classes and stratification has been applied Stratifica-tion ensures the similar distribution of representatives of eachdecision class in both training set and test set in each iterationof the validation procedure

3 Results and Discussion

31 Database Specification The described algorithm for theautomatic classification of specific melanocytic lesions hasbeen tested on dermoscopic images from a widely usedInteractive Atlas of Dermoscopy [2] Images for this atlashave been provided by two university hospitals (Universityof Naples Italy and University of Graz Austria) and storedon a CD-ROM in the JPEG format The documentation ofeach dermoscopic image was performed using a Dermaphotapparatus (Heine Optotechnik Herrsching Germany) anda photo camera (Nikon F3) mounted on a stereomicroscope(Wild M650 Heerbrugg AG Switzerland) in order to pro-duce digitized ELM images of skin lesions All the imageshave been assessed manually by a dermoscopic expert withan extensive clinical experience

Furthermore all the descriptions of skin cases were basedon the histopathological examination of the biopsy materialIn order to develop and test the automatic procedure for theclassification of melanocytic skin lesions 300 images withdifferent resolutions ranging from 0033 to 05mmpixelwere chosen The database included 100 Clark nevus cases70 blue nevus cases 70 Spitz nevus cases and 60 malignantmelanomas

The preprocessing step (black frame removal and hairremoval) as well as the segmentation step (border error lessthan 6) did not affect the further research [14]

32 Statistical Analysis The performance of a classifier canbe assessed based on the analysis of discrepancies in classifi-cation that is differences between the classification carriedout by the classifier and the actual classification (groundtruth) summarized in a confusion matrix In this matrixeach row refers to actual classes 119888(119909) as recorded in the testset and each column refers to classes as predicted by theclassifier (119909) The (119894 119895)th element contains the number oftest instances predicted by a classifier to belong to 119895th classclass whereas they are actually representatives of 119894th class

From a contingency table we can calculate three perfor-mance indicators accuracy true positive rate (TPR) and truenegative rate (TNR)

For a binary classification problem those indicators aregiven by [70]

Accuracy = 1

Tesum

119909isinTe119868 [ (119909) = 119888 (119909)]

TPR =sum119909isinTe 119868 [ (119909) = 119888 (119909) = oplus]

sum119909isinTe 119868 [119888 (119909) = oplus]

TNR =sum119909isinTe 119868 [ (119909) = 119888 (119909) = ⊖]

sum119909isinTe 119868 [119888 (119909) = ⊖]

(18)

where Te is the test set and the function 119868[sdot] denotes theindicator function Positive and negative classes are denotedby oplus and ⊖ respectively

Although the initial problem is a multiclass classificationproblem (with four classes) it can be decomposed into fourbinary classification problems whether an instance belongsto the given class or not

When dealing with a data set exhibiting class imbalanceproblem the accuracy should be considered only a prelim-inary performance indicator For instance for a data setwith class ratio 99 1 a classifier which simply counts eachinstance as a representative of the majority class would scorethe accuracy of 99 Therefore in such cases better measurewould be a plot of the ROC curve and the area under thatcurve

The ROC plot completely visualizes the (normalized)contingency table by means of plot in the unit squarewith TPR rate on the 119910-axis and TNR rate on the 119909-axisEach of the points marked on the ROC plot specifies theclassification performance in terms of true and false positiverates achieved by the corresponding score thresholds In ourstudy posterior probabilities have been used as scores Thosepoints are connected by straight lines producing a piecewiselinear curve the ROC curve that rises monotonically from(0 0) to (1 1) The ROC plot can be interpreted as a plotof costs-to-benefits ratio To measure the performance of aclassifier using a ROC plot the area under that plot curve(AUC) is calculated

Values of aforementioned performance indicators forall examined classifiers have been summarized in Table 2Additionally Figure 12 presents ROC plots for a malignant-or-benign classification

Among all examined classifiers SVM achieved definitelybest results It achieved highest overall accuracy ACCALL =

09296 whereas for the second-best classifier the logisticregression ACCALL = 08028 The SVM outperformed thelogistic regression in classifying all four lesion types It isto be stressed that the SVM turned out to be the mosteffective classifier in recognizing malignant melanoma themost dangerous of all four lesions Its true positive rate andarea under the curve amounted to TPRMM = 08611 andAUCMM = 09688 respectively A little bit lower true negativerate for the SVM than for the decision tree is not a problemin the studied area It indicates misclassification of Clarkor Spitz nevi as malignant melanoma and the cost of sucha misdiagnosis is much much lower than of classifying amalignant tumor as a benign one

The achieved results are slightly better thanmost results ofmalignant-or-benign studies conducted so far Our methodallowed classifying malignant lesions with TPRMM = 08611

and TNRMM = 09623 whereas for other similar studies TPRfalls within the range 08ndash10 andTNRndash05ndash095 respectively[66 71]

A truly reliable diagnosis can be made only by ahistopathological examination of a lesion When in theirstudies Curley et al asked three experienced dermatologiststo classify lesions based only on the visual examination thosephysicians identified correctly only half of cases Therefore

14 BioMed Research International

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(a) Logit (AUC = 09469)

0

02

04

06

08

1

TPR

02 04 06 08 101 minus TNR

(b) Tree (AUC = 09555)

0

02

04

06

08

1

TPR

104 06 080201 minus TNR

(c) 3NN (AUC = 09152)

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(d) 5NN (AUC = 08746)

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(e) 10NN (AUC = 07860)

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(f) 15NN (AUC = 07378)

02 04 06 080 11 minus TNR

0

02

04

06

08

1

TPR

(g) SVM (AUC = 09688)

Figure 12 ROC plots for individual classifiers for a malignant-or-benign classification

BioMed Research International 15

Table 2 The assessment of classifiers for a 1-vs-all and overall classification

Measure Logit Tree 3NN 5NN 10NN 15NN SVMTPRBN 09730 08485 07200 07660 06939 06735 10000TNRBN 10000 09266 10000 10000 09785 09677 10000AUCBN 10000 09675 09914 09889 09679 09659 10000TPRCN 08611 06977 08519 07586 08148 08182 09143TNRCN 09528 09394 08870 08761 08783 08500 09626AUCCN 09817 09279 09615 09089 09099 09030 09851TPRMM 08387 09655 07419 06786 06071 05185 08611TNRMM 09189 09381 08919 08596 08421 08174 09623AUCMM 09469 09555 09152 08746 07860 07378 09688TPRSN 08421 07568 08235 07105 06053 05227 09429TNRSN 09712 09333 09352 09231 08846 08776 09813AUCSN 09648 09425 09606 09395 08812 08545 09789ACCALL 08803 08028 07746 07324 06761 06197 09296BN blue nevus CN Clark nevus MM malignant melanoma SN Spitz nevus ALL multiclass classification

the presented method of classification yields better resultsthan those obtained during a medical examination

4 Conclusions

This paper presents a computer-aided approach for theautomatic classification of melanocytic lesions and accuratediagnosis of melanoma In our research we evaluated ourapproach with four types of classifiers 119896-nearest neighborsalgorithm logistic regression decision tree and SVM Thedeveloped system achieved 92 accuracy with SVM

Our study gives an important contribution to the researcharea of skin lesion classification for several reasons Firstly itfocuses on specifying the exact type of a skin lesion and notonly on categorization of skin lesions as benign or malignantSecondly this work combines the results of research doneso far related to all the steps needed for the developmentof an automatic diagnostic system for melanocytic lesiondetection and classification Finally the refinement of currentapproaches and development of new techniques andmethodswill help to improve the ability to diagnose skin moles moreprecisely and to achieve the goal of the significant reductionin melanoma mortality rate

41 Future Work Starting from the present frameworkfurther research efforts will be firstly addressed to compareand integrate the very promising approaches and corre-sponding feature descriptors reported in the most recentliterature in order to improve the classification accuracy ofmelanocytic lesions We will also conduct a follow-up studyby collecting more real data especially melanoma cases tofurther evaluate our approach One of the major problems inthe field of melanocytic skin tumors is the underdiagnosisof melanoma as a benign melanocytic or nonmelanocyticlesion To decrease the amount of dermoscopic pitfalls thenumber of melanocytic lesion types will be extended for abetter evaluation of melanocytic lesion

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This scientific work was partly supported by the National Sci-ence Centre based on the Decision no 201101NST706783

References

[1] Skin September 2015 httpsciencenationalgeographiccomsciencehealth-and-human-bodyhuman-bodyskin-article

[2] G Argenziano and H P Soyer Interactive Atlas of DermoscopyBook and CDWeb Resource Edra Medical Publishing and NewMedia Milan Itlay 2000

[3] K Korotkov and R Garcia ldquoComputerized analysis of pig-mented skin lesions a reviewrdquoArtificial Intelligence inMedicinevol 56 no 2 pp 69ndash90 2012

[4] P Soyer G Argenziano R Hofmann-Wellenhof and RH JohrColor Atlas of Melanocytic Lesions of the Skin Springer NewYork NY USA 2007

[5] Melanoma Foundation of New Zealand About melanomakey information 2014 httpwwwmelanomanetworkconzAbout-MelanomaKey-Information

[6] E Stocketh T Rosen and S Shumack Managing Skin CancerSpringer 2010

[7] S W Menzies L Bischof H Talbot et al ldquoThe performanceof SolarScan an automated dermoscopy image analysis instru-ment for the diagnosis of primary melanomardquo Archives ofDermatology vol 141 no 11 pp 1388ndash1396 2005

[8] M Burroni R CoronaGDellrsquoEva et al ldquoMelanoma computer-aided diagnosis reliability and feasibility studyrdquoClinical CancerResearch vol 10 no 6 pp 1881ndash1886 2004

[9] A Masood and A A Al-Jumaily ldquoComputer aided diagnosticsupport system for skin cancer a review of techniques andalgorithmsrdquo International Journal of Biomedical Imaging vol2013 Article ID 323268 22 pages 2013

16 BioMed Research International

[10] M E Celebi W V Stoecker and R H Moss ldquoAdvances inskin cancer image analysisrdquo ComputerizedMedical Imaging andGraphics vol 35 no 2 pp 83ndash84 2011

[11] H Zhou M Chen R Gass et al ldquoFeature-preserving artifactremoval from dermoscopy imagesrdquo in Medical Imaging ImageProcessing vol 6914 of Proceedings of SPIE pp 1ndash9 San DiegoCalif USA February 2008

[12] P Wighton T K Lee and M S Atkins ldquoDermascopic hairdisocclusion using inpaintingrdquo in Medical Imaging Image Pro-cessing vol 6914 ofProceedings of SPIE pp 1ndash8 SanDiego CalifUSA February 2008

[13] Q Abbas I F Garcia M Emre Celebi and W Ahmad ldquoAfeature-preserving hair removal algorithm for dermoscopyimagesrdquo Skin Research and Technology vol 19 no 1 pp e27ndashe36 2013

[14] J Jaworek-Korjakowska ldquoNovel method for border irregularityassessment in dermoscopic color imagesrdquo Computational andMathematicalMethods inMedicine vol 2015 Article ID 49620211 pages 2015

[15] J Jaworek-Korjakowska and R Tadeusiewicz ldquoHair removalfrom dermoscopic color imagesrdquo Bio-Algorithms and Med-Systems vol 9 no 2 pp 53ndash58 2013

[16] M E Celebi H Iyatomi G Schaefer and W V StoeckerldquoLesion border detection in dermoscopy imagesrdquoComputerizedMedical Imaging and Graphics vol 33 no 2 pp 148ndash153 2009

[17] J Jaworek-Korjakowska Analiza i detekcja struktur lokalnych wczerniaku złosliwym [Autoreferat Rozprawy Doktorskiej] AGHUniversity of Science and Technology Krakow Poland 2013

[18] E Zagrouba and W Barhoumi ldquoA prelimary approach for theautomated recognition ofmalignantmelanomardquo ImageAnalysisamp Stereology vol 23 no 2 pp 121ndash135 2004

[19] D Ballardand and C Brown Computer Vision Prentice HallProfessional Technical Reference 1st edition 1982

[20] R M Haralick ldquoA measure for circularity of digital figuresrdquoIEEE Transactions on Systems Man and Cybernetics vol 4 no4 pp 394ndash396 1974

[21] R S Montero and E Bribiesca ldquoState of the art of compactnessand circularity measuresrdquo International Mathematical Forumvol 4 no 27 pp 1305ndash1335 2009

[22] J Smietanski R Tadeusiewicz and E Łuczynska ldquoTextureanalysis in perfusion images of prostate cancermdasha case studyrdquoInternational Journal of Applied Mathematics and ComputerScience vol 20 no 1 pp 149ndash156 2010

[23] J W Choi Y W Park S Y Byun and S W Youn ldquoDifferentia-tion of benign pigmented skin lesions with the aid of computerimage analysis a novel approachrdquo Annals of Dermatology vol25 no 3 pp 340ndash347 2013

[24] M E Celebi H A Kingravi B Uddin et al ldquoA methodologicalapproach to the classification of dermoscopy imagesrdquo Comput-erized Medical Imaging and Graphics vol 31 no 6 pp 362ndash3732007

[25] C Manning P Raghavan and H Schutze Introduction toInformation Retrieval Cambridge University Press 2008

[26] J Shawe-Taylor and N Cristianini Kernel Methods for PatternAnalysis Cambridge University Press Cambridge UK 2004

[27] L Kaufman and P Rousseeuw Finding Groups in Data AnIntroduction to Cluster Analysis Wiley Hoboken NJ USA1990

[28] R M Haralick K Shanmugam and I Dinstein ldquoTexturalfeatures for image classificationrdquo IEEE Transactions on SystemsMan and Cybernetics vol 3 no 6 pp 610ndash621 1973

[29] MM Galloway ldquoTexture analysis using gray level run lengthsrdquoComputer Graphics and Image Processing vol 4 no 2 pp 172ndash179 1975

[30] A Chu C M Sehgal and J F Greenleaf ldquoUse of gray valuedistribution of run lengths for texture analysisrdquo Pattern Recog-nition Letters vol 11 no 6 pp 415ndash419 1990

[31] B V Dasarathy and E B Holder ldquoImage characterizationsbased on joint gray level-run length distributionsrdquo PatternRecognition Letters vol 12 no 8 pp 497ndash502 1991

[32] H Ganster A Pinz R Rohrer E Wildling M Binder and HKittler ldquoAutomated melanoma recognitionrdquo IEEE Transactionson Medical Imaging vol 20 no 3 pp 233ndash239 2001

[33] M DrsquoAmico M Ferri and I Stanganelli ldquoQualitative asym-metry measure for melanoma detectionrdquo in Proceedings of theIEEE International Symposium on Biomedical Imaging Nano toMacro vol 2 pp 1155ndash1158 Arlington Va USA April 2004

[34] W V Stoecker W W Li and R H Moss ldquoAutomatic detectionof asymmetry in skin tumorsrdquo Computerized Medical Imagingand Graphics vol 16 no 3 pp 191ndash197 1992

[35] U Fayyad and K Irani ldquoMulti-interval discretization ofcontinuous-valued attributes for classification learningrdquo inProceedings of the 13th International Joint Conference on ArticialIntelligence vol 2 pp 1022ndash1029 Morgan Kaufmann 1993

[36] S Aksoy and R M Haralick ldquoFeature normalization andlikelihood-based similarity measures for image retrievalrdquo Pat-tern Recognition Letters vol 22 no 5 pp 563ndash582 2001

[37] V Ng and D Cheung ldquoMeasuring asymmetries of skin lesionsrdquoin Proceedings of IEEE International Conference on SystemsMan and Cybernetics vol 5 pp 4211ndash4216 IEEE PressOrlando Fla USA October 1997

[38] Z She Y Liu and A Damatoa ldquoCombination of features fromskin pattern and ABCD analysis for lesion classificationrdquo SkinResearch and Technology vol 13 no 1 pp 25ndash33 2007

[39] B Kusumoputro and A Ariyanto ldquoNeural network diagnosisof malignant skin cancers using principal component analysisas a preprocessorrdquo in Proceedings of the IEEE International JointConference on Neural Networks and the IEEE World Congresson Computational Intelligence vol 1 pp 310ndash315 IEEE PressAnchorage Alaska USA May 1998

[40] E Claridge P N Hall M Keefe and J P Allen ldquoShape analysisfor classification ofmalignantmelanomardquo Journal of BiomedicalEngineering vol 14 no 3 pp 229ndash234 1992

[41] Y Cheng R Swamisai S E Umbaugh et al ldquoSkin lesionclassification using relative color featuresrdquo Skin Research andTechnology vol 14 no 1 pp 53ndash64 2008

[42] K Cheung Image processing for skin cancer detection malignantmelanoma recognition [PhD thesis] University of Toronto 1997

[43] T K Lee D I McLean and M S Atkins ldquoIrregularity indexa new border irregularity measure for cutaneous melanocyticlesionsrdquoMedical Image Analysis vol 7 no 1 pp 47ndash64 2003

[44] T K Lee and E Claridge ldquoPredictive power of irregular bordershapes for malignant melanomasrdquo Skin Research and Technol-ogy vol 11 no 1 pp 1ndash8 2005

[45] Y Chang R J Stanley R H Moss andW Van Stoecker ldquoA sys-tematic heuristic approach for feature selection for melanomadiscrimination using clinical imagesrdquo Skin Research and Tech-nology vol 11 no 3 pp 165ndash178 2005

[46] A J Round AW G Duller and P J Fish ldquoLesion classificationusing skin patterningrdquo Skin Research and Technology vol 6 no4 pp 183ndash192 2000

BioMed Research International 17

[47] Z She and P Fish ldquoAnalysis of skin line pattern for lesionclassificationrdquo Skin Research and Technology vol 9 no 1 pp73ndash80 2003

[48] R P Walvick K Patel S V Patwardhan and A P DhawanldquoClassification of melanoma using wavelet-transform-basedoptimal feature setrdquo in Proceedings of the Medical ImagingImage Processing vol 5370 of Proceedings of SPIE pp 944ndash951May 2004

[49] W V Stoecker C-S Chiang and R H Moss ldquoTexture in skinimages comparison of three methods to determine smooth-nessrdquo Computerized Medical Imaging and Graphics vol 16 no3 pp 179ndash190 1992

[50] S V Deshabhoina S E Umbaugh W V Stoecker R H Mossand S K Srinivasan ldquoMelanoma and seborrheic keratosisdifferentiation using texture featuresrdquo Skin Research and Tech-nology vol 9 no 4 pp 348ndash356 2003

[51] G M Weiss ldquoMining with rarity a unifying frameworkrdquo ACMSIGKDD Explorations Newsletter vol 6 no 1 pp 7ndash19 2004

[52] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

[53] H Liu and L Yu ldquoToward integrating feature selection algo-rithms for classification and clusteringrdquo IEEE Transactions onKnowledge and Data Engineering vol 17 no 4 pp 491ndash5022005

[54] I T Jolliffe Principal Component Analysis Springer New YorkNY USA 1986

[55] I Kononenko and E Simec ldquoInduction of decision treesusing RELIEFFrdquo in Proceedings of the ISSEK94 Workshop onMathematical and Statistical Methods in Artificial Intelligencevol 363 of International Centre forMechanical Sciences pp 199ndash220 Springer Vienna Austria 1995

[56] R Battiti ldquoUsing mutual information for selecting features insupervised neural net learningrdquo IEEE Transactions on NeuralNetworks vol 5 no 4 pp 537ndash550 1994

[57] M A Hall ldquoCorrelation-based feature selection for discreteand numeric class machine learningrdquo in Proceedings of the 17thInternational Conference on Machine Learning (ICML rsquo00) pp359ndash366 Morgan Kaufmann 2000

[58] E E Ghiselli Theory of Psychological Measurement McGraw-Hill New York NY USA 1964

[59] I H Witten and E Frank Data Mining Practical MachineLearning Tools and Techniques Morgan Kaufmann San Fran-cisco Calif USA 2005

[60] S Dreiseitl L Ohno-Machado H Kittler S Vinterbo HBillhardt and M Binder ldquoA comparison of machine learningmethods for the diagnosis of pigmented skin lesionsrdquo Journal ofBiomedical Informatics vol 34 no 1 pp 28ndash36 2001

[61] A Glowacz A Glowacz and P Korohoda ldquoRecognition ofmonochrome thermal images of synchronous motor with theapplication of binarization and nearestmean classifierrdquoArchivesof Metallurgy and Materials vol 59 no 1 pp 31ndash34 2014

[62] R Tadeusiewicz How Intelligent Should be System for ImageAnalysisvol 339 of Studies in Computational IntelligenceSpringer Berlin Germany 2011

[63] L Jost ldquoEntropy and diversityrdquo Oikos vol 113 no 2 pp 363ndash375 2006

[64] K-B Duan and S S Keerthi ldquoWhich is the bestmulticlass SVMmethod An empirical studyrdquo inMultiple Classifier Systems 6thInternational Workshop MCS 2005 Seaside CA USA June 13ndash15 2005 Proceedings vol 3541 of Lecture Notes in ComputerScience pp 278ndash285 Springer Berlin Germany 2005

[65] C Hsu and C Lin ldquoA comparison of methods for multi-class support vector machinesrdquo IEEE Transactions on NeuralNetworks vol 13 no 2 pp 415ndash425 2002

[66] A A Safi Towards computer-aided diagnosis of pigmented skinlesionson [PhD thesis] Technische Universitat Munchen 2012

[67] C-W Hsu C-C Chang and C-J Lin ldquoA practical guide tosupport vector classificationrdquo Tech Rep National Taiwan Uni-versity 2003 httpwwwcsientuedutwsimcjlinpapersguideguidepdf

[68] A M Molinaro R Simon and R M Pfeiffer ldquoPrediction errorestimation a comparison of resamplingmethodsrdquo Bioinformat-ics vol 21 no 15 pp 3301ndash3307 2005

[69] R Bouckaert ldquoChoosing between two learning algorithmsbased on calibrated testsrdquo in Proceedings of the 20th Interna-tional Conference on Machine Learning (ICML rsquo03) T Fawcettand N Mishra Eds pp 51ndash58 AAAI Press Washington DCUSA 2003

[70] P Flach Machine Learning The Art and Science of AlgorithmsThat Make Sense of Data Cambridge University Press NewYork NY USA 2012

[71] P Rubegni G Cevenini M Burroni et al ldquoAutomated diagno-sis of pigmented skin lesionsrdquo International Journal of Cancervol 101 no 6 pp 576ndash580 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 2: Research Article Automatic Classification of Specific ...downloads.hindawi.com/journals/bmri/2016/8934242.pdfmost popular melanocytic nevi to the diagnostic categories including Clark

2 BioMed Research International

Optical lense

Polarizing filter

Immersion medium

Epidermis

Dermoepidermal junction

Dermis

(a) (b)

Dermoscopy

(c)

Figure 1 (a) Specific optical system for the pigmented skin lesion examination The mole is typically covered with a liquid (usually oil oralcohol) (b) Dermoscopy is performed with a handheld instrument called a dermatoscope (c) Dermoscopy enables clinicians to observeborder irregularities colors and structures within skin lesions that are otherwise not visible to the unaided eye [2 4]

Background Although the two most commonly diagnosedskin cancers are basal cell carcinoma and squamous cellcarcinoma which develop from the nonpigmented cellsof the skin the most aggressive and dangerous is malig-nant melanoma Malignant melanoma (Latin melanomamalignum) originates in pigment producing cells calledmelanocytes and is less common but far more deadlythan cancers mentioned above [3] Melanomas are fast-growing and highly malignant tumors often spreading tonearby lymph nodes lungs and brain (Figure 2) Malignantmelanoma is likely to become one of the most commonmalignant tumors in the future with even a ten times higherincidence rate [2]

Due to the high skin cancer incidence and mortalityrates early diagnosis of melanoma has become an extremelyimportant issue The progress is visible not only in the pri-mary research but also in the development of sophisticatedmore accurate methods of image processing classificationand computer-aided diagnosis [4]

In this paper we present a new approach to the classifi-cation of melanocytic lesions Most of the research done sofar in this area has concentrated on creating new methodsto distinguish benign from malignant skin lesions In ourresearch we go one step further and differentiate melanocyticlesions including Clark nevus Spitz nevus blue nevus andmalignant melanoma In the next part of this paper we willpresent the clinical importance and motivation to undertakethis research

Clinical Importance and Motivation A melanocytic nevus(also called a nevocytic nevus or a mole) is a lesionthat contains nevus cells (a type of pigment cells calledmelanocytes) A mole can be either subdermal (under theskin) or a pigmented growth on the skin formed mostlyof melanocytes The high concentration of the bodyrsquos pig-menting agent melanin is responsible for their dark colorAlthough melanocytic nevi are very common their histoge-nesis is not well understood and still a matter of debate [4]All we know about the life of melanocytic nevi is based oncross section or cohort studies because it is still complicatedtomonitor skin lesions in vivo on a cellular levelThemajorityof moles appear during the first two decades of a personrsquoslife Acquired moles are a form of benign neoplasms whilecongenital moles (or congenital nevi) are considered a minormalformation or hamartoma and may be at a higher risk ofmelanoma [6]

In our research we concentrate on the classification of themost popular melanocytic nevi to the diagnostic categoriesincluding Clark nevus Blue nevus SpitzReed nevus andmalignant melanoma It is of high importance to diagnosecorrectly the type of a skin lesion because the furtherphysiciansrsquo orders depend on it and some of the nevi havea higher risk of developing malignant melanoma than otherones The proposed system can not only help to preciselydiagnose the type of the skin mole but also decrease theamount of biopsies and reduce the morbidity related to skinlesion excision

BioMed Research International 3

Normal skin

Epidermis

Dermis

Subcutaneoustissue

Stage 0Tumor confined

to epidermis

Stage 1Tumor lt 1mm

without

ulceration

Stage 2Tumor 1ndash4mmwith ulceration

Stage 3Spread to

lymph nodes

Lymph node

Stage 4Spread to other

organs

Skin Liver Lungs

Figure 2 Comparison between healthy skin and skin affected by malignant melanoma Presentation of five stages in malignant melanomaevolution process [5]

Figure 3 Reticular and globular type of Clark nevus based on [2]

Clark Nevus Clark nevi have been named in 1978 afterWallace H Clark Jr who described this particular type ofnevus by studying numerous melanocytic nevi in patientswith concomitant melanomas [2] Clark nevi are the mostcommon nevi in man and while these moles are neithercontagious nor dangerous medical experts believe that Clarknevi do present a higher risk of turning into melanoma whencompared to more common moles Clinical dermoscopicand histopathologic variants of Clark nevi are protean andthe differentiation of Clark nevi from melanoma in situand early invasive melanomas is the major challenge in therealm of pigmented skin lesions [2] Clinically Clark neviare flat to elevated or even slightly papillated pigmentedlesions characterized by various shades of brown colorationand situated on the trunk and extremities that are usuallycalled common junctional nevi or common compound nevi

(Figure 3) Those who have a number of Clark nevi shouldpursue a complete skin examination every year

Blue Nevus According to the original definition by Tiecheblue nevus is a dermal-based benign melanocytic lesionhistopathologically made up by variable proportions ofovalspindle and bipolar usually heavily pigmented dendriticcells [2 4 6] Clinically blue nevi appear as relatively regularsharply circumscribed with a uniform blue to gray-blue orsometimes even gray-black pigmentation Dermoscopicallyblue nevi exhibit a homogeneous pattern with completeabsence of local features such as pigment network structuresbrown globules or black dots within the diffuse homogenouspigmentation (Figure 4) This absence of local features andthe presence of a well-defined border usually without streaksare criteria to differentiate blue nevus from melanoma in

4 BioMed Research International

Figure 4 Examples of blue nevus based on [2]

Figure 5 Examples of SpitzReed nevus based on [2]

many cases with a high degree of certainty As blue nevi areharmless no treatment is needed

SpitzReed Nevus SpitzReed nevi are typically small well-circumscribed reddish papules larger reddish plaques butalso verrucous plaques with variegated colors and may beup to one or two centimeters in diameter Dermoscopicallyabout 50 of pigmented Spitz nevi show a symmetricappearance and a characteristic starburst pattern or a globularpattern with a regular discrete blue pigmentation in thecenter and a characteristic rim of large brown globules at theperiphery (Figure 5) Spitz nevi are well-known simulatorsof cutaneous melanoma from a clinical dermoscopic andhistopathologic point of view [2]

Malignant Melanoma Malignant melanoma is the mostaggressive type of skin cancer It is due to the uncontrolledgrowth of melanocytes The first sign of melanoma is usuallyan unusual looking mole or spot Melanomamay be detectedat an early stage when melanocytic lesions are only a fewmillimeters in diameter but they also may grow up to severalcentimeters in diameter before being diagnosed

Clinically a melanoma can have a variety of colorsincluding white brown black blue red or even light greyMelanomas in early stage are usually small more or lessirregularly shaped and outlined macules or slightly elevatedplaques with pigmentation that varies from pink to dark

brown or black Invasive melanomas are as a rule papular ornodular often ulcerated and characteristically exhibit shadesof brown and black but also foci of red white or blue col-oration Dermoscopic features describing melanoma containmulticomponent pattern irregular dotsglobules atypicalpigment network irregular streaks irregular pigmentationregression structures and blue whitish veil (Figure 6) Asuspected melanoma should be surgically removed with a2-3mm margin (excision biopsy) and sent to a pathologylaboratory for a microscopic examination [2]

Related Works During patient examinations researchersobserve that young inexperienced dermatologists and fam-ily physicians have huge difficulties in the correct visualassessment of skin lesions As stated in [7 8] only expertshave arrived at 90 sensitivity and 59 specificity in skinlesion diagnosis while for unexperienced physicians thesefigures show a significant drop till around 62-63 for generalpractitioners [9] Due to these obstacles researchers tryto implement and build computer-aided diagnosis (CAD)systems for automated diagnosis of melanoma to increasethe specificity and sensitivity and to simplify the assessmentof skin moles Two reviews on the state of the art of CADsystems for skin lesion diagnosis can be found in [3 10]For these systems the true positive rate ranges between08 and 10 and the true negative rate between 05 and095 Although such computer programs are developed for

BioMed Research International 5

Figure 6 Examples of malignant melanoma based on [2]

Single lesionanalysis

Preprocessing Segmentation

(i) Black frame removal(ii) Hair detection

(iii) Hair inpaintingFeature

extraction(i) Geometrical feature

(ii) Color-based feature(iii) Texture description(iv) Asymmetry

Feature selectionclassification

(i) Black frame removal(ii) Hair detection

(iii) Hair inpaintingFeature

extraction

Figure 7 The proposed algorithm for the classification of melanocytic lesions based on dermoscopic color images

Table 1 A categorization of feature descriptors commonly used inthe computerized analysis of dermoscopic images

Clinicalfeatures Feature descriptors References

Asymmetry Symmetry distance [37]Lesionrsquos centroid [38]

Borderirregularity

Fourier feature [39]Fractal geometry [40]Area and perimeter [38 41 42]Irregularity index [43 44]

Colorvariegation RGB statistical descriptors [38 45]

Diameter Semimajor axis of the ellipse [38]

Otherfeatures

Pattern analysis [38 46 47]Wavelet-based descriptors [48]Texture descriptors [49]Intensity distribution descriptors [5]Haralick descriptors [50]

different diagnostic algorithms to the best of our knowledgea CAD system to classify differentmelanocytic lesions has notyet been proposed Based on the review article [3] in Table 1we present an extended categorization of feature descriptorswhich are commonly used in the computerized analysis ofdermoscopic images

2 Materials and Methods

The designed system for the automated diagnosis of melan-ocytic lesions is a computer-aided diagnosis system which isdesigned to reproduce the decision of dermatologist based onthe dermoscopy images The proposed methodology of dis-crimination between malignant melanoma and nevi tumorsis shown in Figure 7The automated system is divided into sixmain stages including preprocessing (image enhancement)segmentation feature extraction feature selection classifica-tion and then evaluationThe system enables texture analysiswithout being limited by selection and detection of structureof interest

In this section the preprocessing and segmentation stepsare described shortly based on our previous work whilethe main stage which is feature extraction selection andclassification is presented in detail The application has beenimplemented using Matlab ver 2013

21 Dermoscopic Image Preprocessing The main goal ofthe preprocessing step is to improve the image quality byreducing or even removing the unrelated and surplus parts inthe dermoscopic imagesDermoscopic images are inhomoge-neous and complex For dermoscopy images the preprocess-ing step is obligatory because of extraneous artifacts such asskin lines air bubbles and hairs which appear in virtuallyevery image

To enhance the image and to reduce the influence oflight hairs air bubble small pores shines and reflections amedian filter is being used

6 BioMed Research International

(a) (b)

(c) (d)

(e) (f)

Figure 8 Outcome of the preprocessing step (a) input image (b) black frame removal (c) grayscale conversion (d) top-hat transform andbinarization process (e) hair distinction from other structures and (f) inpainting

Among the most necessary artifact rejection steps is hairremoval because hairs may cover parts of the image andmake the segmentation and texture analysis impossible Anumber of methods have been developed for hair removalin dermoscopic images and they were mostly based onmorphological operations and adaptive thresholding [11ndash13]A good approach for hair removal is the use of top-hattransformationThe process consists of four steps convertingRGB to grayscale image applying black top-hat transforma-tion distinguishing hairs from other local structures andinpainting

The dermoscopic RGB image is being converted intograyscale with the NTSC 1953 standard (Figure 8(c)) Sec-ondly the black top-hat transform which is a morphologicalimage processing technique is used to detect thick darkhairs The result of this step is the difference between theclosing operation and the input image

119879119908(119868) = 119868 ∘ 119887 minus 119868 (1)

where ∘ denotes the closing operation 119868 is the grayscale inputimage and 119887 is a grayscale structuring element [14 15]

The black top-hat transform returns an image containingelements that are darker than their surrounding and smallerthan the structuring element

These steps have been precisely described in our work [1415] Figure 8 presents the outcome after each step

22 Segmentation Algorithm The techniques and algorithmsavailable for segmentation of medical structures are specificto application imaging modality and type of body part to bestudied For the dermoscopic images segmentation processis one of the most challenging and crucial processes Thisprocess for dermoscopic images is extremely difficult dueto several factors low contrast between the healthy skinand moles variegate coloring inside lesions and irregularborders as well as different artifacts Due to the difficultiesdescribed above numerousmethods have been implementedand tested [3] Celebi et al present in their research [16]

BioMed Research International 7

Health skin segmentation

(a) (b) (c)

(d) (e)

Figure 9 Results of the segmentation process (andashd) segmentation of the healthy skin with the region-growing algorithm (e) segmentedarea

the state of the art of segmentation methods and comparethem with the statistical region merging as a recent colorimage segmentation technique based on region growingand merging Based on the achieved results and conductedexperiments the skin lesion extraction is performed byseeded region-growing algorithm [17] in regard to twoaspects Firstly during the preprocessing step the healthyskin becomes homogeneous Secondly the whole skinmole isvisible in the dermoscopic image and surrounded by healthyskin It means that the healthy skin surrounds the moleRegion-growing techniques generally give better results innoisy images where edges are extremely difficult to detect

For the skin lesion segmentation we take one seed whichis located in the left upper corner of the image (Figure 9(a))The region is iteratively grown by comparing all unallocatedneighboring pixels to the regionThe region-growing processconsists of picking a seed from the set investigating all4-connected neighbors of this seed and merging suitableneighbors to the seed The seed is then removed and allmerged neighbors are added to the seed set The region-growing process continues until the seed set is empty

These steps have been precisely described in our work[14] In Figure 9 we present the results of the segmentationstep for several iterations of the region-growing algorithm

23 Geometrical Feature Extraction Geometrical featureshave been used mainly to describe lesionrsquos outline as itsirregularity usually indicates malignancy Those features arebased mostly on such properties of an object as areadiameters or geometric moments To ensure robustness ofthe selected features their formulation does not dependdirectly on objectrsquos perimeter (as in case of raster images itis hard to accurately estimate this quantity) All features arenormalized to ensure their scale- rotation- and translation-invariance

To assess lesionrsquos shape the following features havebeen extracted maximal diameter (maximum distancebetween two arbitrary points) equivalent diameter (119863eq =

radic(4120587) sdot Area) variance of the radial distance distributionrectangularity (ratio of area of an object to area of itsbounding box) elongation (aspect ratio of objectrsquos boundingbox) eccentricity Haralickrsquos compactness and normalizeddiscrete compactness [18ndash21]

Variance of the radial distance distribution is given by [18]

119904119889

2

=1

card (119861)sum119901isin119861

(119889 (119901 119862) minus 119889 (119901 119862))2

119889 (119901 119862)2

(2)

where 119861 is a set of all pixels constituting perimeter of object119874 119862 is geometric centroid of 119874 and 119889(119886 119887) is the distancebetween pixels 119886 and 119887 in a chosen metric As studied lesionsare of circular shape the Euclidean metric has been chosen

Eccentricity measures deviation of a conic curve from acircle and (in mathematics) is formulated as ratio of distancebetween foci of an ellipse to the length of its major axisIn image processing eccentricity of an object 119874 is actuallyvalue of eccentricity of an ellipse with same second momentsas object 119874 [19]

120576 =(11989802minus 11989820)2

+ 411989811

(11989802+ 11989820)2

(3)

where 119898119901119902

denotes an image moment of (119901 119902) order How-ever such a formulation of eccentricity still serves the pur-pose of measuring circularity of an object

Haralickrsquos compactness is a measure for circularity of adigital figure defined as 120583

119877120590119877 where 119877 is a random variable

of the distance between the center of the figure to any part ofits perimeter [20]

8 BioMed Research International

Normalized discrete compactness 119862119863119873

measure is basedon counting the number of cell sides common to adjacentpixels of objectrsquos perimeter a measure called discrete com-pactness 119862

119863[21]

119862119863119873

=119862119863minus 119862119863min

119862119863max minus 119862

119863

=119875 minus 2119899 + 2

119875 + 4radic119899

119862119863=4119899 minus 119875

2 119862119863min = 119899 minus 1 119862

119863max =4119899 minus 4radic119899

2

(4)

where119862119863min119862119863max are respectively lower and upper bound

of discrete compactness of a shape composed of 119899 pixels and119875 is the perimeter of the digital region [21]

24 Color-Based Features The analysis of lesionrsquos colors is animportant source of information when determining lesionrsquostype as malignant lesions are characterized by a rich texture[22]

The following color-based features have been extractednumber of colors present within lesionrsquos area (together withinformation about the presence of two specific colors whiteand black) concentricity centroid distance and 119871

lowast

119886lowast

119887lowast

histogram distances [23 24]To determine the number of colors and to calculate the

concentricity an image has been converted to CIE 119871lowast

119886lowast

119887lowast

color space and then 119886lowast and 119887

lowast color channels have beenclustered into four clusters Three clustering algorithms havebeen tested 119896-means clustering kernel 119896-means clusteringand hierarchical agglomerative clustering [25 26] Kernel119896-means algorithm has been tested using Gaussian kernelfor 120590 isin 1 2 [26] In case of hierarchical agglomerativeclustering Wardrsquos minimum variance method based onEuclidean distance between clusters has been applied as thecriterion for choosing the pair of clusters to merge at eachstep [27]

The number of colors has been related to the greatestdistance between clustersrsquo centroids (each pair of clusters hadbeen considered) and it has been assumed that to identify twocolors as significantly different the distance between them in119886lowast

119887lowast space must exceed certain threshold value

119899colors = max(1 lfloor1120591max (119863)rfloor)

119863 = 119889119864(1198881 1198882) 1198881 1198882isin 119862 and 119888

1= 1198882

(5)

where 119862 is a set of clustersrsquo centroids The threshold value 120591has been set to 120591 = 12

It has been assumed that to state the presence of a certaincolor within the lesion the area of biggest cohesive area ofthat color must be greater than 1 of the area of the wholelesion Pixels have been recognized as white if 119871lowast gt 65 and asblack if 119871lowast lt 15 where 119871lowast is the value of luminosity channelfor the given pixel

Based on the aforementioned clusterization a measurecalled ldquoconcentricityrdquo is derived [23] To compute concen-tricity of an object its color clusters are first arranged inascending order regarding the area of their convex hull(ie segment 119878

1has the smallest area of convex hull and

segment 1198784the greatest) and then a few auxiliary indexes are

calculated The 119899(sdot) function stands for the number of pixelsof the given object

For segment 1198781 the percent area index PA is computed

[23]

PA =119899 (119862119862max)

119899 (1198781)

(6)

where 119862119862max is the largest cohesive set of pixels of 1198781 The PAdescribes how well the core is grouped

Let Hull denote the minimal convex area which includesthe entire pixels of 119878

2 The core inclusion CI is then defined

as [23]

CI =119899 (1198781capHull)

119899 (1198781)

(7)

The CI describes how well the second smallest area encirclesthe core

Let Core denote the minimal convex area which includesthe entire pixels of 119878

1 The hull exclusion HE is given by [23]

HE = 1 minus119899 (1198782cap Core)

119899 (1198782)

(8)

TheHEdescribes how exclusively the hull surrounds the coreFinally concentricity is defined as the product of the three

variables [23]

Concentricity = PA times CI timesHE

Concentricity isin [0 1]

(9)

Concentricity closer to one implies a better concentric struc-ture

Images of all regions used to compute concentricity havebeen compiled in Figure 10

The centroid distance for a color channel is defined asthe distance between the geometric centroid (of the binaryobject) and the brightness centroid of that channel Thebrightness centroid may be considered a center of mass ofan object whose density is determined by intensity valuesof its pixels If the pigmentation in a particular channel ishomogeneous the brightness centroid will be close to thegeometric centroid resulting in small value of the centroidaldistance

Color similarity of two regions has been quantified by the1198711- and 119871

2-norm histogram distances for CIE 119871

lowast

119886lowast

119887lowast color

space coarsely quantized into 4 times 8 times 8 bins [24]

1198711(119867119860 119867119861) =

4times8times8

sum

119894=1

1003816100381610038161003816119867119860 (119894) minus 119867119861(119894)1003816100381610038161003816

1198712(119867119860 119867119861) =

4times8times8

sum

119894=1

(119867119860(119894) minus 119867

119861(119894))2

(10)

where119867119860119867119861are histograms of areas 119860 and 119861 respectively

BioMed Research International 9

Object S1 S2 CCmax

Core S1 cap core Hull S2 cap hull

Figure 10 Regions used to compute concentricity of an object

25 Texture Description Quantitative properties of lesionrsquostexture were described with measures based on a gray levelcooccurrence matrix (GLCM) and a gray level run-lengthmatrix (GLRLM) [28 29]

GLCM is a square matrix 119875 where the (119894 119895)th entry of 119875represents information about the frequency of occurrence ofsuch two adjacent pixels where one of them has intensity 119894

and another has intensity 119895 Such informationmay be used toderive second-order statistical measurements characterizingexamined texture GLCM and six measures derived from it(contrast correlation energy homogeneity maximum prob-ability and dissimilarity) have been calculated as describedby Celebi et al [24]

Contrast = sum

119894119895

(119894 minus 119895)2

119875119894119895

Correlation = minussum

119894119895

(119894 minus 120583119909) (119895 minus 120583

119910)

120590119909120590119910

119875119894119895

Energy = sum

119894

sum

119895

[119875119894119895]2

Homogeneity = sum

119894

sum

119895

119875119894119895

1 + (119894 minus 119895)2

Maximum Probability = max119894119895

119875119894119895

Dissimilarity = sum

119894

sum

119895

119875119894119895sdot1003816100381610038161003816119894 minus 119895

1003816100381610038161003816

(11)

With GLRLM it is possible to analyze higher orderstatistical features for the given texture GLRLM is a two-dimensional matrix in which each element 119901(119894 119895 | 120579) gives

the total number of occurrences of runs of length 119895 at graylevel 119894 in a given direction 120579

In our study the direction of runs is irrelevant asdermoscopic camera has no fixed orientation when takingimages of lesionsmdashthere is no ldquoreference orientationrdquo Theonly thing that matters is texturersquos homogeneity thus insteadof calculating measures for individual orientations 120579 isin

0∘

45∘

90∘

135∘

all GLRLM computed for the aforemen-tioned orientations have been added together and mea-sures have been computed only using this new orientation-invariant matrix

GLRLM is used to derive 11measures Five basicmeasures(SRE LRE GLN RLN and RP) describe the distribution ofrunsrsquo lengths (SRE LRE RLN and RP) and runsrsquo intensity(GLN) [29] As SRE and LRE do not consider pixelsrsquo intensityLGRE andHGREmeasures have been proposed [30] FinallySRLGE SRHGE LRLGE and LRHGEmeasures are based onstatistical properties of a joint probability distribution of bothrunsrsquo intensity and length [31]

SRE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

119894119895

1198952

LRE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198952

119894119895

GLN =1

119899119903

119866

sum

119894=1

(

119877

sum

119895=1

119894119895)

2

RLN =1

119899119903

119877

sum

119895=1

(

119866

sum

119894=1

119894119895)

2

10 BioMed Research International

RP =119899119903

119899

LGRE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

119894119895

1198942

HGRE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198942

119894119895

SRLGE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

119894119895

1198942 sdot 1198952

SRHGE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198942

119894119895

1198952

LRLGE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198952

119894119895

1198942

LRHGE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198942

1198952

119894119895

(12)

where 119899 is the number of image pixels with 119866 gray levels 119877 isthe maximal run length and 119899

119903is the total number of runs in

an image (119899119903= sum119866

119894=1sum119877

119895=1119894119895)

26 Asymmetry The observed asymmetry is a significantdiagnostic premise as in malignant lesions the arrangementof local structures (eg dots streaks and pigmentation nets)is nonuniform across the whole area of a lesion [32] Someresearchers point out that for instance sharp transitionsbetween central and border area indicatemalignancy [24 33]

Asymmetry measures may be defined in terms of ratiosof feature values across various lesionrsquos area segments or interms of changes of feature values between halves obtainedby splitting lesion area into halves using straight section linespassing through the center of mass [32 33]

In our study a different approach has been adopted Thelesion area has been first divided into sets of subregions inthree differentmanners (Figure 11) by splitting (1) into centraland border part (2) into halves along minor and major axisand (3) into quarters using same axes Then a variance offeature values has been calculated for each set of subregionsAdditionally asymmetry indices proposed by Stoecker et al[34] and the following quantities have been computed

119877119876119894

=119891119876119894

119891119882

(13)

where 119891119876119894

denotes value of feature 119891 calculated for 119894thquadrant and 119891

119882for the whole area respectively

27 Data Preparation In order to apply correlated-basefeature selection method (the choice of this particularmethod is discussed later) numerical features have beendiscretizedThis is due to the fact that in the studied problem

the decision feature is a categorical variable which meansit takes on discrete values and the relationship betweenvariables of those two classes cannot be directly assessedTheonly solution is to discretize the variable taking continuousvalues In general terms discretization is a simple logicalcondition which takes into account one or a few attributesand which aims at splitting a range of data into at least twosubsets Data in our study have been discretized accordingto the method proposed by Fayyad and Irani [35] whichuses heuristics minimizing entropy based on the ldquominimumdescription length principlerdquo

To ensure correct work of algorithms based on analysis ofdistances between points in the feature space (eg 119896-nearestneighbors algorithm SVM) for each feature its values havebeen scaled to the range of [minus1 1] Consequently the riskof a situation in which a feature with larger range of valuesdominates other features has been eliminated Normallydistributed features have been transformed using 119885-score(Equation (14)) whereas other features were rescaled linearlyto the desired range (15) [36] Consider

x =x minus x3119904

where x =1

119873sum

119894

x119894 119904 =

1

119873 minus 1(sum

119894

(x119894minus 120583)2

)

(14)

x = 2x119894minus xmin

xmax minus xminminus 1 (15)

where x is a vector of featurersquos values prior to normalizationand x is the same vector after normalization To determineif a given feature is well modeled by a normal distributionchi-square goodness-of-fit test has been applied

28 Class Imbalance Problem The data set used in thisstudy exhibits class imbalance problem which means thatthe classes are not approximately equally represented (ie itconsists of more than three times as many images of Clarknevus than images of blue nevus) In such case most clas-sifiers will focus on learning how to identify representativesof majority classes leading to poor predictive accuracy forminority classes When making a diagnosis such a situationis unacceptable as misclassification error may lead even topatientrsquos death

The most common way to address this issue is sampling[51] There exist two main types of sampling methodsundersampling which consists in removing representativesof the majority class and oversampling which consists inadding examples of the minority class Out of many samplingtechniques two methods are particularly popular randomundersampling and synthetic minority oversampling technique(SMOTE) In random undersampling method randomlydrawn representatives of the majority class are removed fromthe data set In SMOTE method new ldquosyntheticrdquo examplesare created by averaging a few representatives of the sameminority classes which are nearest neighbors in the featurespace [52]

BioMed Research International 11

(a) Whole area

(b) Inner and outer area

(c) Split along the major axis (d) Split along the minor axis

(e) Split along both major and minor axis simultaneously

Figure 11 A sample division of an object into subregions

In this study SMOTE method has been used as it hasbeen proven to be more effective than random undersam-pling when tested on a set of dermoscopic images obtainedfrom various sources without any common or rare lesiontypes omitted [24]

29 Feature Selection Feature selection consists in reducingdata dimensionality by rejecting redundant unimportant ornoisy features thus resulting in increased prediction accu-racy less complex classifier models and better computationefficiency

Feature selection algorithms may be divided into threemain categories filters wrappers and projections [53 54] Inthis study filter methods have been used for two reasons[24] Firstly as filters are usually fast it is possible to test

a huge number of combinations of features Secondly if onewould like to use wrappers on a given data set the targetlearning algorithm should demonstrate satisfactory resultsfor the original data set as wrappers are based on feedbackprinciple As some features extracted in this study might beirrelevant or redundant as well as due to class imbalancewrappers have not been likely to fulfill those restrictionsOn the other hand projections are used mainly to increasecomputational performance which was not the case in thisstudy

Out of numerous available filters three areworth noticingdue to their satisfactory performance on various data sets[53] ReliefFmutual information-based feature selection andcorrelation-based feature selection [55ndash57] As numerous fea-tures extracted in this study are strongly mutually correlated

12 BioMed Research International

correlation-based feature selection has been chosen as theonly method which takes into account not only relationshipsbetween features and the decision class but also relationshipsbetween features themselves

The heuristic ldquomeritrdquo Merit119878of a feature subset 119878 contain-

ing 119896 features is given by [58]

Merit119878=

119896119903cf

radic119896 + 119896 (119896 minus 1) 119903ff (16)

where 119903cf is the average feature-class correlation and 119903ff theaverage feature-feature intercorrelation

To compare two feature vectors of discrete values thesymmetric uncertainty an improved version of informationgain measure has been used [59] Symmetric uncertainty isgiven by

SU (119883 119884) = 2 [119867 (119883) + 119867 (119884) minus 119867 (119883 119884)

119867 (119884) + 119867 (119883)] (17)

where 119867(119860) is the marginal entropy of set 119860 and 119867(119860 119861) isthe joint entropy of sets 119860 and 119861

The number of selected features is a parameter whichrequires a careful tuning too few features results may pre-vent classifiers from distinguishing between various classeswhereas too many features impose risk of overfittingmdashasituation when a model excels in classifying training databut fails to generalize knowledge and hence misclassifies newsamples As some researchers suggest limiting the number offeatures 119896 to the range of 5 le 119896 le 30 [24] in this study 119896 = 20

has been adoptedThe applied procedure allowed choosing features which

best discriminate lesion types concentricity computed using119896-means algorithm concentricity computed using kernel 119896-means algorithm for 120590 = 1 119871lowast119886lowast119887lowast histogram distancesin 1198711and 119871

2metrics computed for central and border

part variances from sets of 119871lowast

119886lowast

119887lowast histogram distances

between the whole region and individual quarters in both1198711and 119871

2metrics variances from sets of 119871lowast119886lowast119887lowast histogram

distances between each pair of quarters in both 1198711and 119871

2

metrics variances from sets of GLCM features (dissimilar-ity maximum probability entropy energy correlation andcontrast) computed for quarters variance of variances of119886lowast channel values (in 119871

lowast

119886lowast

119887lowast color space) calculated for

quarters variance of variances of H channel values (in HSVcolor space) calculated for quarters mean value of variancesof 119886lowast channel values (in 119871

lowast

119886lowast

119887lowast color space) for quarters

variance from sets of 119871lowast119886lowast119887lowast histogram distances betweenhalves in both 119871

1and 119871

2metrics and variance from sets of

GLCM dissimilarity computed for halvesThis particular choice of features does not diverge from

the clinical practice according to which asymmetry is one ofthe most important premises in determining the lesion type

210 Classification

2101 Models The following predictive models have beentested [60 61] 119896-nearest neighbors algorithm (for 119896 isin

3 5 10 15) logistic regression decision tree and supportvector machine (SVM) Other models like neural networks

or rule-based systems have not been a subject of this study[62] In case of the decision tree Gini-Simpson index hasbeen used as a measure of data diversity [63] Decisionboundary of SVMs has been determined using radial basisfunction as kernels Radial basis function has been preferredover linear sigmoid and polynomial kernels as it exhibits afew important properties [24] it allows classifying nonlin-early separable data sets (as is the case in this study) it ischaracterized by high numerical stability and it is definedusing only one parameter Moreover as in this study there areas many as four decision classes the multiclass classificationproblem has been decomposed into a (greater) number ofbinary classification problems Winner-takes-all strategy hasbeen applied to combine solutions of subproblems into afinal solution as for small data sets its results are similarto the results of the best strategymdashpairwise couplingmdashwhileat the same time its computational burden is much lower[64 65]

Individual classifiers have been trained using typicalparameters It has been assumed that the cost of misclassify-ing melanoma as nevi is four times higher than another sortof misclassification Initial experimental results have provenSVM to be the most effective classifier (similar observationshave been made by other research teams [60 66]) Con-sequently further experiments were focused on improvingperformance of SVMs by fine-tuning their parameters

SVMwith radial basis function kernel is governed by twoparametersmdashmisclassification cost 119862 and kernel width 120574The task is to select those parameters in such a way that theaccuracy of predictions for new data (data not used in thelearning process) is maximal

As values of only two parameters have had to be adjustedgrid search has been applied [67] For each parametervalues from exponential series have been considered 119862 isin

2minus5

2minus3

215

and 120574 isin 2minus15

2minus13

23

Each com-bination of parameter values (119862

0 1205740) has been assessed

using a validation procedure described below After the gridsearch had finished SVM have been trained using optimalparameter values (119862lowast 120574lowast)

2102 Validation The aim of the validation step is to assessclassifierrsquos quality based on the number of generalizationerrors that is misclassified samples Optimal SVM modelshave been validated using stratified Monte Carlo cross vali-dation method and in each iteration the test set consisted of10 of samples

The Monte Carlo variant of cross validation had beenchosen for a number of reasons Firstly experimental resultsobtained for a similar problem [24] suggest high effectivenessof this method Secondly in their analysis Molinaro et al[68] point out that for datasets consisting of few sampleswith many features such as a dataset used in this studythe difference in effectiveness between Monte Carlo andldquoclassicrdquo 10-fold cross validation is insignificant The numberof samples drawn into the test set has been determined usinga ldquorule of thumbrdquo [69] Finally by applying Monte Carlocross validation one may avoid constructing overdevel-oped predictive models which decreases the risk of overfit-ting

BioMed Research International 13

As in the data set used in this study there is a considerabledifference in the number of representatives between variousdecision classes and stratification has been applied Stratifica-tion ensures the similar distribution of representatives of eachdecision class in both training set and test set in each iterationof the validation procedure

3 Results and Discussion

31 Database Specification The described algorithm for theautomatic classification of specific melanocytic lesions hasbeen tested on dermoscopic images from a widely usedInteractive Atlas of Dermoscopy [2] Images for this atlashave been provided by two university hospitals (Universityof Naples Italy and University of Graz Austria) and storedon a CD-ROM in the JPEG format The documentation ofeach dermoscopic image was performed using a Dermaphotapparatus (Heine Optotechnik Herrsching Germany) anda photo camera (Nikon F3) mounted on a stereomicroscope(Wild M650 Heerbrugg AG Switzerland) in order to pro-duce digitized ELM images of skin lesions All the imageshave been assessed manually by a dermoscopic expert withan extensive clinical experience

Furthermore all the descriptions of skin cases were basedon the histopathological examination of the biopsy materialIn order to develop and test the automatic procedure for theclassification of melanocytic skin lesions 300 images withdifferent resolutions ranging from 0033 to 05mmpixelwere chosen The database included 100 Clark nevus cases70 blue nevus cases 70 Spitz nevus cases and 60 malignantmelanomas

The preprocessing step (black frame removal and hairremoval) as well as the segmentation step (border error lessthan 6) did not affect the further research [14]

32 Statistical Analysis The performance of a classifier canbe assessed based on the analysis of discrepancies in classifi-cation that is differences between the classification carriedout by the classifier and the actual classification (groundtruth) summarized in a confusion matrix In this matrixeach row refers to actual classes 119888(119909) as recorded in the testset and each column refers to classes as predicted by theclassifier (119909) The (119894 119895)th element contains the number oftest instances predicted by a classifier to belong to 119895th classclass whereas they are actually representatives of 119894th class

From a contingency table we can calculate three perfor-mance indicators accuracy true positive rate (TPR) and truenegative rate (TNR)

For a binary classification problem those indicators aregiven by [70]

Accuracy = 1

Tesum

119909isinTe119868 [ (119909) = 119888 (119909)]

TPR =sum119909isinTe 119868 [ (119909) = 119888 (119909) = oplus]

sum119909isinTe 119868 [119888 (119909) = oplus]

TNR =sum119909isinTe 119868 [ (119909) = 119888 (119909) = ⊖]

sum119909isinTe 119868 [119888 (119909) = ⊖]

(18)

where Te is the test set and the function 119868[sdot] denotes theindicator function Positive and negative classes are denotedby oplus and ⊖ respectively

Although the initial problem is a multiclass classificationproblem (with four classes) it can be decomposed into fourbinary classification problems whether an instance belongsto the given class or not

When dealing with a data set exhibiting class imbalanceproblem the accuracy should be considered only a prelim-inary performance indicator For instance for a data setwith class ratio 99 1 a classifier which simply counts eachinstance as a representative of the majority class would scorethe accuracy of 99 Therefore in such cases better measurewould be a plot of the ROC curve and the area under thatcurve

The ROC plot completely visualizes the (normalized)contingency table by means of plot in the unit squarewith TPR rate on the 119910-axis and TNR rate on the 119909-axisEach of the points marked on the ROC plot specifies theclassification performance in terms of true and false positiverates achieved by the corresponding score thresholds In ourstudy posterior probabilities have been used as scores Thosepoints are connected by straight lines producing a piecewiselinear curve the ROC curve that rises monotonically from(0 0) to (1 1) The ROC plot can be interpreted as a plotof costs-to-benefits ratio To measure the performance of aclassifier using a ROC plot the area under that plot curve(AUC) is calculated

Values of aforementioned performance indicators forall examined classifiers have been summarized in Table 2Additionally Figure 12 presents ROC plots for a malignant-or-benign classification

Among all examined classifiers SVM achieved definitelybest results It achieved highest overall accuracy ACCALL =

09296 whereas for the second-best classifier the logisticregression ACCALL = 08028 The SVM outperformed thelogistic regression in classifying all four lesion types It isto be stressed that the SVM turned out to be the mosteffective classifier in recognizing malignant melanoma themost dangerous of all four lesions Its true positive rate andarea under the curve amounted to TPRMM = 08611 andAUCMM = 09688 respectively A little bit lower true negativerate for the SVM than for the decision tree is not a problemin the studied area It indicates misclassification of Clarkor Spitz nevi as malignant melanoma and the cost of sucha misdiagnosis is much much lower than of classifying amalignant tumor as a benign one

The achieved results are slightly better thanmost results ofmalignant-or-benign studies conducted so far Our methodallowed classifying malignant lesions with TPRMM = 08611

and TNRMM = 09623 whereas for other similar studies TPRfalls within the range 08ndash10 andTNRndash05ndash095 respectively[66 71]

A truly reliable diagnosis can be made only by ahistopathological examination of a lesion When in theirstudies Curley et al asked three experienced dermatologiststo classify lesions based only on the visual examination thosephysicians identified correctly only half of cases Therefore

14 BioMed Research International

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(a) Logit (AUC = 09469)

0

02

04

06

08

1

TPR

02 04 06 08 101 minus TNR

(b) Tree (AUC = 09555)

0

02

04

06

08

1

TPR

104 06 080201 minus TNR

(c) 3NN (AUC = 09152)

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(d) 5NN (AUC = 08746)

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(e) 10NN (AUC = 07860)

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(f) 15NN (AUC = 07378)

02 04 06 080 11 minus TNR

0

02

04

06

08

1

TPR

(g) SVM (AUC = 09688)

Figure 12 ROC plots for individual classifiers for a malignant-or-benign classification

BioMed Research International 15

Table 2 The assessment of classifiers for a 1-vs-all and overall classification

Measure Logit Tree 3NN 5NN 10NN 15NN SVMTPRBN 09730 08485 07200 07660 06939 06735 10000TNRBN 10000 09266 10000 10000 09785 09677 10000AUCBN 10000 09675 09914 09889 09679 09659 10000TPRCN 08611 06977 08519 07586 08148 08182 09143TNRCN 09528 09394 08870 08761 08783 08500 09626AUCCN 09817 09279 09615 09089 09099 09030 09851TPRMM 08387 09655 07419 06786 06071 05185 08611TNRMM 09189 09381 08919 08596 08421 08174 09623AUCMM 09469 09555 09152 08746 07860 07378 09688TPRSN 08421 07568 08235 07105 06053 05227 09429TNRSN 09712 09333 09352 09231 08846 08776 09813AUCSN 09648 09425 09606 09395 08812 08545 09789ACCALL 08803 08028 07746 07324 06761 06197 09296BN blue nevus CN Clark nevus MM malignant melanoma SN Spitz nevus ALL multiclass classification

the presented method of classification yields better resultsthan those obtained during a medical examination

4 Conclusions

This paper presents a computer-aided approach for theautomatic classification of melanocytic lesions and accuratediagnosis of melanoma In our research we evaluated ourapproach with four types of classifiers 119896-nearest neighborsalgorithm logistic regression decision tree and SVM Thedeveloped system achieved 92 accuracy with SVM

Our study gives an important contribution to the researcharea of skin lesion classification for several reasons Firstly itfocuses on specifying the exact type of a skin lesion and notonly on categorization of skin lesions as benign or malignantSecondly this work combines the results of research doneso far related to all the steps needed for the developmentof an automatic diagnostic system for melanocytic lesiondetection and classification Finally the refinement of currentapproaches and development of new techniques andmethodswill help to improve the ability to diagnose skin moles moreprecisely and to achieve the goal of the significant reductionin melanoma mortality rate

41 Future Work Starting from the present frameworkfurther research efforts will be firstly addressed to compareand integrate the very promising approaches and corre-sponding feature descriptors reported in the most recentliterature in order to improve the classification accuracy ofmelanocytic lesions We will also conduct a follow-up studyby collecting more real data especially melanoma cases tofurther evaluate our approach One of the major problems inthe field of melanocytic skin tumors is the underdiagnosisof melanoma as a benign melanocytic or nonmelanocyticlesion To decrease the amount of dermoscopic pitfalls thenumber of melanocytic lesion types will be extended for abetter evaluation of melanocytic lesion

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This scientific work was partly supported by the National Sci-ence Centre based on the Decision no 201101NST706783

References

[1] Skin September 2015 httpsciencenationalgeographiccomsciencehealth-and-human-bodyhuman-bodyskin-article

[2] G Argenziano and H P Soyer Interactive Atlas of DermoscopyBook and CDWeb Resource Edra Medical Publishing and NewMedia Milan Itlay 2000

[3] K Korotkov and R Garcia ldquoComputerized analysis of pig-mented skin lesions a reviewrdquoArtificial Intelligence inMedicinevol 56 no 2 pp 69ndash90 2012

[4] P Soyer G Argenziano R Hofmann-Wellenhof and RH JohrColor Atlas of Melanocytic Lesions of the Skin Springer NewYork NY USA 2007

[5] Melanoma Foundation of New Zealand About melanomakey information 2014 httpwwwmelanomanetworkconzAbout-MelanomaKey-Information

[6] E Stocketh T Rosen and S Shumack Managing Skin CancerSpringer 2010

[7] S W Menzies L Bischof H Talbot et al ldquoThe performanceof SolarScan an automated dermoscopy image analysis instru-ment for the diagnosis of primary melanomardquo Archives ofDermatology vol 141 no 11 pp 1388ndash1396 2005

[8] M Burroni R CoronaGDellrsquoEva et al ldquoMelanoma computer-aided diagnosis reliability and feasibility studyrdquoClinical CancerResearch vol 10 no 6 pp 1881ndash1886 2004

[9] A Masood and A A Al-Jumaily ldquoComputer aided diagnosticsupport system for skin cancer a review of techniques andalgorithmsrdquo International Journal of Biomedical Imaging vol2013 Article ID 323268 22 pages 2013

16 BioMed Research International

[10] M E Celebi W V Stoecker and R H Moss ldquoAdvances inskin cancer image analysisrdquo ComputerizedMedical Imaging andGraphics vol 35 no 2 pp 83ndash84 2011

[11] H Zhou M Chen R Gass et al ldquoFeature-preserving artifactremoval from dermoscopy imagesrdquo in Medical Imaging ImageProcessing vol 6914 of Proceedings of SPIE pp 1ndash9 San DiegoCalif USA February 2008

[12] P Wighton T K Lee and M S Atkins ldquoDermascopic hairdisocclusion using inpaintingrdquo in Medical Imaging Image Pro-cessing vol 6914 ofProceedings of SPIE pp 1ndash8 SanDiego CalifUSA February 2008

[13] Q Abbas I F Garcia M Emre Celebi and W Ahmad ldquoAfeature-preserving hair removal algorithm for dermoscopyimagesrdquo Skin Research and Technology vol 19 no 1 pp e27ndashe36 2013

[14] J Jaworek-Korjakowska ldquoNovel method for border irregularityassessment in dermoscopic color imagesrdquo Computational andMathematicalMethods inMedicine vol 2015 Article ID 49620211 pages 2015

[15] J Jaworek-Korjakowska and R Tadeusiewicz ldquoHair removalfrom dermoscopic color imagesrdquo Bio-Algorithms and Med-Systems vol 9 no 2 pp 53ndash58 2013

[16] M E Celebi H Iyatomi G Schaefer and W V StoeckerldquoLesion border detection in dermoscopy imagesrdquoComputerizedMedical Imaging and Graphics vol 33 no 2 pp 148ndash153 2009

[17] J Jaworek-Korjakowska Analiza i detekcja struktur lokalnych wczerniaku złosliwym [Autoreferat Rozprawy Doktorskiej] AGHUniversity of Science and Technology Krakow Poland 2013

[18] E Zagrouba and W Barhoumi ldquoA prelimary approach for theautomated recognition ofmalignantmelanomardquo ImageAnalysisamp Stereology vol 23 no 2 pp 121ndash135 2004

[19] D Ballardand and C Brown Computer Vision Prentice HallProfessional Technical Reference 1st edition 1982

[20] R M Haralick ldquoA measure for circularity of digital figuresrdquoIEEE Transactions on Systems Man and Cybernetics vol 4 no4 pp 394ndash396 1974

[21] R S Montero and E Bribiesca ldquoState of the art of compactnessand circularity measuresrdquo International Mathematical Forumvol 4 no 27 pp 1305ndash1335 2009

[22] J Smietanski R Tadeusiewicz and E Łuczynska ldquoTextureanalysis in perfusion images of prostate cancermdasha case studyrdquoInternational Journal of Applied Mathematics and ComputerScience vol 20 no 1 pp 149ndash156 2010

[23] J W Choi Y W Park S Y Byun and S W Youn ldquoDifferentia-tion of benign pigmented skin lesions with the aid of computerimage analysis a novel approachrdquo Annals of Dermatology vol25 no 3 pp 340ndash347 2013

[24] M E Celebi H A Kingravi B Uddin et al ldquoA methodologicalapproach to the classification of dermoscopy imagesrdquo Comput-erized Medical Imaging and Graphics vol 31 no 6 pp 362ndash3732007

[25] C Manning P Raghavan and H Schutze Introduction toInformation Retrieval Cambridge University Press 2008

[26] J Shawe-Taylor and N Cristianini Kernel Methods for PatternAnalysis Cambridge University Press Cambridge UK 2004

[27] L Kaufman and P Rousseeuw Finding Groups in Data AnIntroduction to Cluster Analysis Wiley Hoboken NJ USA1990

[28] R M Haralick K Shanmugam and I Dinstein ldquoTexturalfeatures for image classificationrdquo IEEE Transactions on SystemsMan and Cybernetics vol 3 no 6 pp 610ndash621 1973

[29] MM Galloway ldquoTexture analysis using gray level run lengthsrdquoComputer Graphics and Image Processing vol 4 no 2 pp 172ndash179 1975

[30] A Chu C M Sehgal and J F Greenleaf ldquoUse of gray valuedistribution of run lengths for texture analysisrdquo Pattern Recog-nition Letters vol 11 no 6 pp 415ndash419 1990

[31] B V Dasarathy and E B Holder ldquoImage characterizationsbased on joint gray level-run length distributionsrdquo PatternRecognition Letters vol 12 no 8 pp 497ndash502 1991

[32] H Ganster A Pinz R Rohrer E Wildling M Binder and HKittler ldquoAutomated melanoma recognitionrdquo IEEE Transactionson Medical Imaging vol 20 no 3 pp 233ndash239 2001

[33] M DrsquoAmico M Ferri and I Stanganelli ldquoQualitative asym-metry measure for melanoma detectionrdquo in Proceedings of theIEEE International Symposium on Biomedical Imaging Nano toMacro vol 2 pp 1155ndash1158 Arlington Va USA April 2004

[34] W V Stoecker W W Li and R H Moss ldquoAutomatic detectionof asymmetry in skin tumorsrdquo Computerized Medical Imagingand Graphics vol 16 no 3 pp 191ndash197 1992

[35] U Fayyad and K Irani ldquoMulti-interval discretization ofcontinuous-valued attributes for classification learningrdquo inProceedings of the 13th International Joint Conference on ArticialIntelligence vol 2 pp 1022ndash1029 Morgan Kaufmann 1993

[36] S Aksoy and R M Haralick ldquoFeature normalization andlikelihood-based similarity measures for image retrievalrdquo Pat-tern Recognition Letters vol 22 no 5 pp 563ndash582 2001

[37] V Ng and D Cheung ldquoMeasuring asymmetries of skin lesionsrdquoin Proceedings of IEEE International Conference on SystemsMan and Cybernetics vol 5 pp 4211ndash4216 IEEE PressOrlando Fla USA October 1997

[38] Z She Y Liu and A Damatoa ldquoCombination of features fromskin pattern and ABCD analysis for lesion classificationrdquo SkinResearch and Technology vol 13 no 1 pp 25ndash33 2007

[39] B Kusumoputro and A Ariyanto ldquoNeural network diagnosisof malignant skin cancers using principal component analysisas a preprocessorrdquo in Proceedings of the IEEE International JointConference on Neural Networks and the IEEE World Congresson Computational Intelligence vol 1 pp 310ndash315 IEEE PressAnchorage Alaska USA May 1998

[40] E Claridge P N Hall M Keefe and J P Allen ldquoShape analysisfor classification ofmalignantmelanomardquo Journal of BiomedicalEngineering vol 14 no 3 pp 229ndash234 1992

[41] Y Cheng R Swamisai S E Umbaugh et al ldquoSkin lesionclassification using relative color featuresrdquo Skin Research andTechnology vol 14 no 1 pp 53ndash64 2008

[42] K Cheung Image processing for skin cancer detection malignantmelanoma recognition [PhD thesis] University of Toronto 1997

[43] T K Lee D I McLean and M S Atkins ldquoIrregularity indexa new border irregularity measure for cutaneous melanocyticlesionsrdquoMedical Image Analysis vol 7 no 1 pp 47ndash64 2003

[44] T K Lee and E Claridge ldquoPredictive power of irregular bordershapes for malignant melanomasrdquo Skin Research and Technol-ogy vol 11 no 1 pp 1ndash8 2005

[45] Y Chang R J Stanley R H Moss andW Van Stoecker ldquoA sys-tematic heuristic approach for feature selection for melanomadiscrimination using clinical imagesrdquo Skin Research and Tech-nology vol 11 no 3 pp 165ndash178 2005

[46] A J Round AW G Duller and P J Fish ldquoLesion classificationusing skin patterningrdquo Skin Research and Technology vol 6 no4 pp 183ndash192 2000

BioMed Research International 17

[47] Z She and P Fish ldquoAnalysis of skin line pattern for lesionclassificationrdquo Skin Research and Technology vol 9 no 1 pp73ndash80 2003

[48] R P Walvick K Patel S V Patwardhan and A P DhawanldquoClassification of melanoma using wavelet-transform-basedoptimal feature setrdquo in Proceedings of the Medical ImagingImage Processing vol 5370 of Proceedings of SPIE pp 944ndash951May 2004

[49] W V Stoecker C-S Chiang and R H Moss ldquoTexture in skinimages comparison of three methods to determine smooth-nessrdquo Computerized Medical Imaging and Graphics vol 16 no3 pp 179ndash190 1992

[50] S V Deshabhoina S E Umbaugh W V Stoecker R H Mossand S K Srinivasan ldquoMelanoma and seborrheic keratosisdifferentiation using texture featuresrdquo Skin Research and Tech-nology vol 9 no 4 pp 348ndash356 2003

[51] G M Weiss ldquoMining with rarity a unifying frameworkrdquo ACMSIGKDD Explorations Newsletter vol 6 no 1 pp 7ndash19 2004

[52] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

[53] H Liu and L Yu ldquoToward integrating feature selection algo-rithms for classification and clusteringrdquo IEEE Transactions onKnowledge and Data Engineering vol 17 no 4 pp 491ndash5022005

[54] I T Jolliffe Principal Component Analysis Springer New YorkNY USA 1986

[55] I Kononenko and E Simec ldquoInduction of decision treesusing RELIEFFrdquo in Proceedings of the ISSEK94 Workshop onMathematical and Statistical Methods in Artificial Intelligencevol 363 of International Centre forMechanical Sciences pp 199ndash220 Springer Vienna Austria 1995

[56] R Battiti ldquoUsing mutual information for selecting features insupervised neural net learningrdquo IEEE Transactions on NeuralNetworks vol 5 no 4 pp 537ndash550 1994

[57] M A Hall ldquoCorrelation-based feature selection for discreteand numeric class machine learningrdquo in Proceedings of the 17thInternational Conference on Machine Learning (ICML rsquo00) pp359ndash366 Morgan Kaufmann 2000

[58] E E Ghiselli Theory of Psychological Measurement McGraw-Hill New York NY USA 1964

[59] I H Witten and E Frank Data Mining Practical MachineLearning Tools and Techniques Morgan Kaufmann San Fran-cisco Calif USA 2005

[60] S Dreiseitl L Ohno-Machado H Kittler S Vinterbo HBillhardt and M Binder ldquoA comparison of machine learningmethods for the diagnosis of pigmented skin lesionsrdquo Journal ofBiomedical Informatics vol 34 no 1 pp 28ndash36 2001

[61] A Glowacz A Glowacz and P Korohoda ldquoRecognition ofmonochrome thermal images of synchronous motor with theapplication of binarization and nearestmean classifierrdquoArchivesof Metallurgy and Materials vol 59 no 1 pp 31ndash34 2014

[62] R Tadeusiewicz How Intelligent Should be System for ImageAnalysisvol 339 of Studies in Computational IntelligenceSpringer Berlin Germany 2011

[63] L Jost ldquoEntropy and diversityrdquo Oikos vol 113 no 2 pp 363ndash375 2006

[64] K-B Duan and S S Keerthi ldquoWhich is the bestmulticlass SVMmethod An empirical studyrdquo inMultiple Classifier Systems 6thInternational Workshop MCS 2005 Seaside CA USA June 13ndash15 2005 Proceedings vol 3541 of Lecture Notes in ComputerScience pp 278ndash285 Springer Berlin Germany 2005

[65] C Hsu and C Lin ldquoA comparison of methods for multi-class support vector machinesrdquo IEEE Transactions on NeuralNetworks vol 13 no 2 pp 415ndash425 2002

[66] A A Safi Towards computer-aided diagnosis of pigmented skinlesionson [PhD thesis] Technische Universitat Munchen 2012

[67] C-W Hsu C-C Chang and C-J Lin ldquoA practical guide tosupport vector classificationrdquo Tech Rep National Taiwan Uni-versity 2003 httpwwwcsientuedutwsimcjlinpapersguideguidepdf

[68] A M Molinaro R Simon and R M Pfeiffer ldquoPrediction errorestimation a comparison of resamplingmethodsrdquo Bioinformat-ics vol 21 no 15 pp 3301ndash3307 2005

[69] R Bouckaert ldquoChoosing between two learning algorithmsbased on calibrated testsrdquo in Proceedings of the 20th Interna-tional Conference on Machine Learning (ICML rsquo03) T Fawcettand N Mishra Eds pp 51ndash58 AAAI Press Washington DCUSA 2003

[70] P Flach Machine Learning The Art and Science of AlgorithmsThat Make Sense of Data Cambridge University Press NewYork NY USA 2012

[71] P Rubegni G Cevenini M Burroni et al ldquoAutomated diagno-sis of pigmented skin lesionsrdquo International Journal of Cancervol 101 no 6 pp 576ndash580 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 3: Research Article Automatic Classification of Specific ...downloads.hindawi.com/journals/bmri/2016/8934242.pdfmost popular melanocytic nevi to the diagnostic categories including Clark

BioMed Research International 3

Normal skin

Epidermis

Dermis

Subcutaneoustissue

Stage 0Tumor confined

to epidermis

Stage 1Tumor lt 1mm

without

ulceration

Stage 2Tumor 1ndash4mmwith ulceration

Stage 3Spread to

lymph nodes

Lymph node

Stage 4Spread to other

organs

Skin Liver Lungs

Figure 2 Comparison between healthy skin and skin affected by malignant melanoma Presentation of five stages in malignant melanomaevolution process [5]

Figure 3 Reticular and globular type of Clark nevus based on [2]

Clark Nevus Clark nevi have been named in 1978 afterWallace H Clark Jr who described this particular type ofnevus by studying numerous melanocytic nevi in patientswith concomitant melanomas [2] Clark nevi are the mostcommon nevi in man and while these moles are neithercontagious nor dangerous medical experts believe that Clarknevi do present a higher risk of turning into melanoma whencompared to more common moles Clinical dermoscopicand histopathologic variants of Clark nevi are protean andthe differentiation of Clark nevi from melanoma in situand early invasive melanomas is the major challenge in therealm of pigmented skin lesions [2] Clinically Clark neviare flat to elevated or even slightly papillated pigmentedlesions characterized by various shades of brown colorationand situated on the trunk and extremities that are usuallycalled common junctional nevi or common compound nevi

(Figure 3) Those who have a number of Clark nevi shouldpursue a complete skin examination every year

Blue Nevus According to the original definition by Tiecheblue nevus is a dermal-based benign melanocytic lesionhistopathologically made up by variable proportions ofovalspindle and bipolar usually heavily pigmented dendriticcells [2 4 6] Clinically blue nevi appear as relatively regularsharply circumscribed with a uniform blue to gray-blue orsometimes even gray-black pigmentation Dermoscopicallyblue nevi exhibit a homogeneous pattern with completeabsence of local features such as pigment network structuresbrown globules or black dots within the diffuse homogenouspigmentation (Figure 4) This absence of local features andthe presence of a well-defined border usually without streaksare criteria to differentiate blue nevus from melanoma in

4 BioMed Research International

Figure 4 Examples of blue nevus based on [2]

Figure 5 Examples of SpitzReed nevus based on [2]

many cases with a high degree of certainty As blue nevi areharmless no treatment is needed

SpitzReed Nevus SpitzReed nevi are typically small well-circumscribed reddish papules larger reddish plaques butalso verrucous plaques with variegated colors and may beup to one or two centimeters in diameter Dermoscopicallyabout 50 of pigmented Spitz nevi show a symmetricappearance and a characteristic starburst pattern or a globularpattern with a regular discrete blue pigmentation in thecenter and a characteristic rim of large brown globules at theperiphery (Figure 5) Spitz nevi are well-known simulatorsof cutaneous melanoma from a clinical dermoscopic andhistopathologic point of view [2]

Malignant Melanoma Malignant melanoma is the mostaggressive type of skin cancer It is due to the uncontrolledgrowth of melanocytes The first sign of melanoma is usuallyan unusual looking mole or spot Melanomamay be detectedat an early stage when melanocytic lesions are only a fewmillimeters in diameter but they also may grow up to severalcentimeters in diameter before being diagnosed

Clinically a melanoma can have a variety of colorsincluding white brown black blue red or even light greyMelanomas in early stage are usually small more or lessirregularly shaped and outlined macules or slightly elevatedplaques with pigmentation that varies from pink to dark

brown or black Invasive melanomas are as a rule papular ornodular often ulcerated and characteristically exhibit shadesof brown and black but also foci of red white or blue col-oration Dermoscopic features describing melanoma containmulticomponent pattern irregular dotsglobules atypicalpigment network irregular streaks irregular pigmentationregression structures and blue whitish veil (Figure 6) Asuspected melanoma should be surgically removed with a2-3mm margin (excision biopsy) and sent to a pathologylaboratory for a microscopic examination [2]

Related Works During patient examinations researchersobserve that young inexperienced dermatologists and fam-ily physicians have huge difficulties in the correct visualassessment of skin lesions As stated in [7 8] only expertshave arrived at 90 sensitivity and 59 specificity in skinlesion diagnosis while for unexperienced physicians thesefigures show a significant drop till around 62-63 for generalpractitioners [9] Due to these obstacles researchers tryto implement and build computer-aided diagnosis (CAD)systems for automated diagnosis of melanoma to increasethe specificity and sensitivity and to simplify the assessmentof skin moles Two reviews on the state of the art of CADsystems for skin lesion diagnosis can be found in [3 10]For these systems the true positive rate ranges between08 and 10 and the true negative rate between 05 and095 Although such computer programs are developed for

BioMed Research International 5

Figure 6 Examples of malignant melanoma based on [2]

Single lesionanalysis

Preprocessing Segmentation

(i) Black frame removal(ii) Hair detection

(iii) Hair inpaintingFeature

extraction(i) Geometrical feature

(ii) Color-based feature(iii) Texture description(iv) Asymmetry

Feature selectionclassification

(i) Black frame removal(ii) Hair detection

(iii) Hair inpaintingFeature

extraction

Figure 7 The proposed algorithm for the classification of melanocytic lesions based on dermoscopic color images

Table 1 A categorization of feature descriptors commonly used inthe computerized analysis of dermoscopic images

Clinicalfeatures Feature descriptors References

Asymmetry Symmetry distance [37]Lesionrsquos centroid [38]

Borderirregularity

Fourier feature [39]Fractal geometry [40]Area and perimeter [38 41 42]Irregularity index [43 44]

Colorvariegation RGB statistical descriptors [38 45]

Diameter Semimajor axis of the ellipse [38]

Otherfeatures

Pattern analysis [38 46 47]Wavelet-based descriptors [48]Texture descriptors [49]Intensity distribution descriptors [5]Haralick descriptors [50]

different diagnostic algorithms to the best of our knowledgea CAD system to classify differentmelanocytic lesions has notyet been proposed Based on the review article [3] in Table 1we present an extended categorization of feature descriptorswhich are commonly used in the computerized analysis ofdermoscopic images

2 Materials and Methods

The designed system for the automated diagnosis of melan-ocytic lesions is a computer-aided diagnosis system which isdesigned to reproduce the decision of dermatologist based onthe dermoscopy images The proposed methodology of dis-crimination between malignant melanoma and nevi tumorsis shown in Figure 7The automated system is divided into sixmain stages including preprocessing (image enhancement)segmentation feature extraction feature selection classifica-tion and then evaluationThe system enables texture analysiswithout being limited by selection and detection of structureof interest

In this section the preprocessing and segmentation stepsare described shortly based on our previous work whilethe main stage which is feature extraction selection andclassification is presented in detail The application has beenimplemented using Matlab ver 2013

21 Dermoscopic Image Preprocessing The main goal ofthe preprocessing step is to improve the image quality byreducing or even removing the unrelated and surplus parts inthe dermoscopic imagesDermoscopic images are inhomoge-neous and complex For dermoscopy images the preprocess-ing step is obligatory because of extraneous artifacts such asskin lines air bubbles and hairs which appear in virtuallyevery image

To enhance the image and to reduce the influence oflight hairs air bubble small pores shines and reflections amedian filter is being used

6 BioMed Research International

(a) (b)

(c) (d)

(e) (f)

Figure 8 Outcome of the preprocessing step (a) input image (b) black frame removal (c) grayscale conversion (d) top-hat transform andbinarization process (e) hair distinction from other structures and (f) inpainting

Among the most necessary artifact rejection steps is hairremoval because hairs may cover parts of the image andmake the segmentation and texture analysis impossible Anumber of methods have been developed for hair removalin dermoscopic images and they were mostly based onmorphological operations and adaptive thresholding [11ndash13]A good approach for hair removal is the use of top-hattransformationThe process consists of four steps convertingRGB to grayscale image applying black top-hat transforma-tion distinguishing hairs from other local structures andinpainting

The dermoscopic RGB image is being converted intograyscale with the NTSC 1953 standard (Figure 8(c)) Sec-ondly the black top-hat transform which is a morphologicalimage processing technique is used to detect thick darkhairs The result of this step is the difference between theclosing operation and the input image

119879119908(119868) = 119868 ∘ 119887 minus 119868 (1)

where ∘ denotes the closing operation 119868 is the grayscale inputimage and 119887 is a grayscale structuring element [14 15]

The black top-hat transform returns an image containingelements that are darker than their surrounding and smallerthan the structuring element

These steps have been precisely described in our work [1415] Figure 8 presents the outcome after each step

22 Segmentation Algorithm The techniques and algorithmsavailable for segmentation of medical structures are specificto application imaging modality and type of body part to bestudied For the dermoscopic images segmentation processis one of the most challenging and crucial processes Thisprocess for dermoscopic images is extremely difficult dueto several factors low contrast between the healthy skinand moles variegate coloring inside lesions and irregularborders as well as different artifacts Due to the difficultiesdescribed above numerousmethods have been implementedand tested [3] Celebi et al present in their research [16]

BioMed Research International 7

Health skin segmentation

(a) (b) (c)

(d) (e)

Figure 9 Results of the segmentation process (andashd) segmentation of the healthy skin with the region-growing algorithm (e) segmentedarea

the state of the art of segmentation methods and comparethem with the statistical region merging as a recent colorimage segmentation technique based on region growingand merging Based on the achieved results and conductedexperiments the skin lesion extraction is performed byseeded region-growing algorithm [17] in regard to twoaspects Firstly during the preprocessing step the healthyskin becomes homogeneous Secondly the whole skinmole isvisible in the dermoscopic image and surrounded by healthyskin It means that the healthy skin surrounds the moleRegion-growing techniques generally give better results innoisy images where edges are extremely difficult to detect

For the skin lesion segmentation we take one seed whichis located in the left upper corner of the image (Figure 9(a))The region is iteratively grown by comparing all unallocatedneighboring pixels to the regionThe region-growing processconsists of picking a seed from the set investigating all4-connected neighbors of this seed and merging suitableneighbors to the seed The seed is then removed and allmerged neighbors are added to the seed set The region-growing process continues until the seed set is empty

These steps have been precisely described in our work[14] In Figure 9 we present the results of the segmentationstep for several iterations of the region-growing algorithm

23 Geometrical Feature Extraction Geometrical featureshave been used mainly to describe lesionrsquos outline as itsirregularity usually indicates malignancy Those features arebased mostly on such properties of an object as areadiameters or geometric moments To ensure robustness ofthe selected features their formulation does not dependdirectly on objectrsquos perimeter (as in case of raster images itis hard to accurately estimate this quantity) All features arenormalized to ensure their scale- rotation- and translation-invariance

To assess lesionrsquos shape the following features havebeen extracted maximal diameter (maximum distancebetween two arbitrary points) equivalent diameter (119863eq =

radic(4120587) sdot Area) variance of the radial distance distributionrectangularity (ratio of area of an object to area of itsbounding box) elongation (aspect ratio of objectrsquos boundingbox) eccentricity Haralickrsquos compactness and normalizeddiscrete compactness [18ndash21]

Variance of the radial distance distribution is given by [18]

119904119889

2

=1

card (119861)sum119901isin119861

(119889 (119901 119862) minus 119889 (119901 119862))2

119889 (119901 119862)2

(2)

where 119861 is a set of all pixels constituting perimeter of object119874 119862 is geometric centroid of 119874 and 119889(119886 119887) is the distancebetween pixels 119886 and 119887 in a chosen metric As studied lesionsare of circular shape the Euclidean metric has been chosen

Eccentricity measures deviation of a conic curve from acircle and (in mathematics) is formulated as ratio of distancebetween foci of an ellipse to the length of its major axisIn image processing eccentricity of an object 119874 is actuallyvalue of eccentricity of an ellipse with same second momentsas object 119874 [19]

120576 =(11989802minus 11989820)2

+ 411989811

(11989802+ 11989820)2

(3)

where 119898119901119902

denotes an image moment of (119901 119902) order How-ever such a formulation of eccentricity still serves the pur-pose of measuring circularity of an object

Haralickrsquos compactness is a measure for circularity of adigital figure defined as 120583

119877120590119877 where 119877 is a random variable

of the distance between the center of the figure to any part ofits perimeter [20]

8 BioMed Research International

Normalized discrete compactness 119862119863119873

measure is basedon counting the number of cell sides common to adjacentpixels of objectrsquos perimeter a measure called discrete com-pactness 119862

119863[21]

119862119863119873

=119862119863minus 119862119863min

119862119863max minus 119862

119863

=119875 minus 2119899 + 2

119875 + 4radic119899

119862119863=4119899 minus 119875

2 119862119863min = 119899 minus 1 119862

119863max =4119899 minus 4radic119899

2

(4)

where119862119863min119862119863max are respectively lower and upper bound

of discrete compactness of a shape composed of 119899 pixels and119875 is the perimeter of the digital region [21]

24 Color-Based Features The analysis of lesionrsquos colors is animportant source of information when determining lesionrsquostype as malignant lesions are characterized by a rich texture[22]

The following color-based features have been extractednumber of colors present within lesionrsquos area (together withinformation about the presence of two specific colors whiteand black) concentricity centroid distance and 119871

lowast

119886lowast

119887lowast

histogram distances [23 24]To determine the number of colors and to calculate the

concentricity an image has been converted to CIE 119871lowast

119886lowast

119887lowast

color space and then 119886lowast and 119887

lowast color channels have beenclustered into four clusters Three clustering algorithms havebeen tested 119896-means clustering kernel 119896-means clusteringand hierarchical agglomerative clustering [25 26] Kernel119896-means algorithm has been tested using Gaussian kernelfor 120590 isin 1 2 [26] In case of hierarchical agglomerativeclustering Wardrsquos minimum variance method based onEuclidean distance between clusters has been applied as thecriterion for choosing the pair of clusters to merge at eachstep [27]

The number of colors has been related to the greatestdistance between clustersrsquo centroids (each pair of clusters hadbeen considered) and it has been assumed that to identify twocolors as significantly different the distance between them in119886lowast

119887lowast space must exceed certain threshold value

119899colors = max(1 lfloor1120591max (119863)rfloor)

119863 = 119889119864(1198881 1198882) 1198881 1198882isin 119862 and 119888

1= 1198882

(5)

where 119862 is a set of clustersrsquo centroids The threshold value 120591has been set to 120591 = 12

It has been assumed that to state the presence of a certaincolor within the lesion the area of biggest cohesive area ofthat color must be greater than 1 of the area of the wholelesion Pixels have been recognized as white if 119871lowast gt 65 and asblack if 119871lowast lt 15 where 119871lowast is the value of luminosity channelfor the given pixel

Based on the aforementioned clusterization a measurecalled ldquoconcentricityrdquo is derived [23] To compute concen-tricity of an object its color clusters are first arranged inascending order regarding the area of their convex hull(ie segment 119878

1has the smallest area of convex hull and

segment 1198784the greatest) and then a few auxiliary indexes are

calculated The 119899(sdot) function stands for the number of pixelsof the given object

For segment 1198781 the percent area index PA is computed

[23]

PA =119899 (119862119862max)

119899 (1198781)

(6)

where 119862119862max is the largest cohesive set of pixels of 1198781 The PAdescribes how well the core is grouped

Let Hull denote the minimal convex area which includesthe entire pixels of 119878

2 The core inclusion CI is then defined

as [23]

CI =119899 (1198781capHull)

119899 (1198781)

(7)

The CI describes how well the second smallest area encirclesthe core

Let Core denote the minimal convex area which includesthe entire pixels of 119878

1 The hull exclusion HE is given by [23]

HE = 1 minus119899 (1198782cap Core)

119899 (1198782)

(8)

TheHEdescribes how exclusively the hull surrounds the coreFinally concentricity is defined as the product of the three

variables [23]

Concentricity = PA times CI timesHE

Concentricity isin [0 1]

(9)

Concentricity closer to one implies a better concentric struc-ture

Images of all regions used to compute concentricity havebeen compiled in Figure 10

The centroid distance for a color channel is defined asthe distance between the geometric centroid (of the binaryobject) and the brightness centroid of that channel Thebrightness centroid may be considered a center of mass ofan object whose density is determined by intensity valuesof its pixels If the pigmentation in a particular channel ishomogeneous the brightness centroid will be close to thegeometric centroid resulting in small value of the centroidaldistance

Color similarity of two regions has been quantified by the1198711- and 119871

2-norm histogram distances for CIE 119871

lowast

119886lowast

119887lowast color

space coarsely quantized into 4 times 8 times 8 bins [24]

1198711(119867119860 119867119861) =

4times8times8

sum

119894=1

1003816100381610038161003816119867119860 (119894) minus 119867119861(119894)1003816100381610038161003816

1198712(119867119860 119867119861) =

4times8times8

sum

119894=1

(119867119860(119894) minus 119867

119861(119894))2

(10)

where119867119860119867119861are histograms of areas 119860 and 119861 respectively

BioMed Research International 9

Object S1 S2 CCmax

Core S1 cap core Hull S2 cap hull

Figure 10 Regions used to compute concentricity of an object

25 Texture Description Quantitative properties of lesionrsquostexture were described with measures based on a gray levelcooccurrence matrix (GLCM) and a gray level run-lengthmatrix (GLRLM) [28 29]

GLCM is a square matrix 119875 where the (119894 119895)th entry of 119875represents information about the frequency of occurrence ofsuch two adjacent pixels where one of them has intensity 119894

and another has intensity 119895 Such informationmay be used toderive second-order statistical measurements characterizingexamined texture GLCM and six measures derived from it(contrast correlation energy homogeneity maximum prob-ability and dissimilarity) have been calculated as describedby Celebi et al [24]

Contrast = sum

119894119895

(119894 minus 119895)2

119875119894119895

Correlation = minussum

119894119895

(119894 minus 120583119909) (119895 minus 120583

119910)

120590119909120590119910

119875119894119895

Energy = sum

119894

sum

119895

[119875119894119895]2

Homogeneity = sum

119894

sum

119895

119875119894119895

1 + (119894 minus 119895)2

Maximum Probability = max119894119895

119875119894119895

Dissimilarity = sum

119894

sum

119895

119875119894119895sdot1003816100381610038161003816119894 minus 119895

1003816100381610038161003816

(11)

With GLRLM it is possible to analyze higher orderstatistical features for the given texture GLRLM is a two-dimensional matrix in which each element 119901(119894 119895 | 120579) gives

the total number of occurrences of runs of length 119895 at graylevel 119894 in a given direction 120579

In our study the direction of runs is irrelevant asdermoscopic camera has no fixed orientation when takingimages of lesionsmdashthere is no ldquoreference orientationrdquo Theonly thing that matters is texturersquos homogeneity thus insteadof calculating measures for individual orientations 120579 isin

0∘

45∘

90∘

135∘

all GLRLM computed for the aforemen-tioned orientations have been added together and mea-sures have been computed only using this new orientation-invariant matrix

GLRLM is used to derive 11measures Five basicmeasures(SRE LRE GLN RLN and RP) describe the distribution ofrunsrsquo lengths (SRE LRE RLN and RP) and runsrsquo intensity(GLN) [29] As SRE and LRE do not consider pixelsrsquo intensityLGRE andHGREmeasures have been proposed [30] FinallySRLGE SRHGE LRLGE and LRHGEmeasures are based onstatistical properties of a joint probability distribution of bothrunsrsquo intensity and length [31]

SRE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

119894119895

1198952

LRE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198952

119894119895

GLN =1

119899119903

119866

sum

119894=1

(

119877

sum

119895=1

119894119895)

2

RLN =1

119899119903

119877

sum

119895=1

(

119866

sum

119894=1

119894119895)

2

10 BioMed Research International

RP =119899119903

119899

LGRE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

119894119895

1198942

HGRE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198942

119894119895

SRLGE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

119894119895

1198942 sdot 1198952

SRHGE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198942

119894119895

1198952

LRLGE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198952

119894119895

1198942

LRHGE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198942

1198952

119894119895

(12)

where 119899 is the number of image pixels with 119866 gray levels 119877 isthe maximal run length and 119899

119903is the total number of runs in

an image (119899119903= sum119866

119894=1sum119877

119895=1119894119895)

26 Asymmetry The observed asymmetry is a significantdiagnostic premise as in malignant lesions the arrangementof local structures (eg dots streaks and pigmentation nets)is nonuniform across the whole area of a lesion [32] Someresearchers point out that for instance sharp transitionsbetween central and border area indicatemalignancy [24 33]

Asymmetry measures may be defined in terms of ratiosof feature values across various lesionrsquos area segments or interms of changes of feature values between halves obtainedby splitting lesion area into halves using straight section linespassing through the center of mass [32 33]

In our study a different approach has been adopted Thelesion area has been first divided into sets of subregions inthree differentmanners (Figure 11) by splitting (1) into centraland border part (2) into halves along minor and major axisand (3) into quarters using same axes Then a variance offeature values has been calculated for each set of subregionsAdditionally asymmetry indices proposed by Stoecker et al[34] and the following quantities have been computed

119877119876119894

=119891119876119894

119891119882

(13)

where 119891119876119894

denotes value of feature 119891 calculated for 119894thquadrant and 119891

119882for the whole area respectively

27 Data Preparation In order to apply correlated-basefeature selection method (the choice of this particularmethod is discussed later) numerical features have beendiscretizedThis is due to the fact that in the studied problem

the decision feature is a categorical variable which meansit takes on discrete values and the relationship betweenvariables of those two classes cannot be directly assessedTheonly solution is to discretize the variable taking continuousvalues In general terms discretization is a simple logicalcondition which takes into account one or a few attributesand which aims at splitting a range of data into at least twosubsets Data in our study have been discretized accordingto the method proposed by Fayyad and Irani [35] whichuses heuristics minimizing entropy based on the ldquominimumdescription length principlerdquo

To ensure correct work of algorithms based on analysis ofdistances between points in the feature space (eg 119896-nearestneighbors algorithm SVM) for each feature its values havebeen scaled to the range of [minus1 1] Consequently the riskof a situation in which a feature with larger range of valuesdominates other features has been eliminated Normallydistributed features have been transformed using 119885-score(Equation (14)) whereas other features were rescaled linearlyto the desired range (15) [36] Consider

x =x minus x3119904

where x =1

119873sum

119894

x119894 119904 =

1

119873 minus 1(sum

119894

(x119894minus 120583)2

)

(14)

x = 2x119894minus xmin

xmax minus xminminus 1 (15)

where x is a vector of featurersquos values prior to normalizationand x is the same vector after normalization To determineif a given feature is well modeled by a normal distributionchi-square goodness-of-fit test has been applied

28 Class Imbalance Problem The data set used in thisstudy exhibits class imbalance problem which means thatthe classes are not approximately equally represented (ie itconsists of more than three times as many images of Clarknevus than images of blue nevus) In such case most clas-sifiers will focus on learning how to identify representativesof majority classes leading to poor predictive accuracy forminority classes When making a diagnosis such a situationis unacceptable as misclassification error may lead even topatientrsquos death

The most common way to address this issue is sampling[51] There exist two main types of sampling methodsundersampling which consists in removing representativesof the majority class and oversampling which consists inadding examples of the minority class Out of many samplingtechniques two methods are particularly popular randomundersampling and synthetic minority oversampling technique(SMOTE) In random undersampling method randomlydrawn representatives of the majority class are removed fromthe data set In SMOTE method new ldquosyntheticrdquo examplesare created by averaging a few representatives of the sameminority classes which are nearest neighbors in the featurespace [52]

BioMed Research International 11

(a) Whole area

(b) Inner and outer area

(c) Split along the major axis (d) Split along the minor axis

(e) Split along both major and minor axis simultaneously

Figure 11 A sample division of an object into subregions

In this study SMOTE method has been used as it hasbeen proven to be more effective than random undersam-pling when tested on a set of dermoscopic images obtainedfrom various sources without any common or rare lesiontypes omitted [24]

29 Feature Selection Feature selection consists in reducingdata dimensionality by rejecting redundant unimportant ornoisy features thus resulting in increased prediction accu-racy less complex classifier models and better computationefficiency

Feature selection algorithms may be divided into threemain categories filters wrappers and projections [53 54] Inthis study filter methods have been used for two reasons[24] Firstly as filters are usually fast it is possible to test

a huge number of combinations of features Secondly if onewould like to use wrappers on a given data set the targetlearning algorithm should demonstrate satisfactory resultsfor the original data set as wrappers are based on feedbackprinciple As some features extracted in this study might beirrelevant or redundant as well as due to class imbalancewrappers have not been likely to fulfill those restrictionsOn the other hand projections are used mainly to increasecomputational performance which was not the case in thisstudy

Out of numerous available filters three areworth noticingdue to their satisfactory performance on various data sets[53] ReliefFmutual information-based feature selection andcorrelation-based feature selection [55ndash57] As numerous fea-tures extracted in this study are strongly mutually correlated

12 BioMed Research International

correlation-based feature selection has been chosen as theonly method which takes into account not only relationshipsbetween features and the decision class but also relationshipsbetween features themselves

The heuristic ldquomeritrdquo Merit119878of a feature subset 119878 contain-

ing 119896 features is given by [58]

Merit119878=

119896119903cf

radic119896 + 119896 (119896 minus 1) 119903ff (16)

where 119903cf is the average feature-class correlation and 119903ff theaverage feature-feature intercorrelation

To compare two feature vectors of discrete values thesymmetric uncertainty an improved version of informationgain measure has been used [59] Symmetric uncertainty isgiven by

SU (119883 119884) = 2 [119867 (119883) + 119867 (119884) minus 119867 (119883 119884)

119867 (119884) + 119867 (119883)] (17)

where 119867(119860) is the marginal entropy of set 119860 and 119867(119860 119861) isthe joint entropy of sets 119860 and 119861

The number of selected features is a parameter whichrequires a careful tuning too few features results may pre-vent classifiers from distinguishing between various classeswhereas too many features impose risk of overfittingmdashasituation when a model excels in classifying training databut fails to generalize knowledge and hence misclassifies newsamples As some researchers suggest limiting the number offeatures 119896 to the range of 5 le 119896 le 30 [24] in this study 119896 = 20

has been adoptedThe applied procedure allowed choosing features which

best discriminate lesion types concentricity computed using119896-means algorithm concentricity computed using kernel 119896-means algorithm for 120590 = 1 119871lowast119886lowast119887lowast histogram distancesin 1198711and 119871

2metrics computed for central and border

part variances from sets of 119871lowast

119886lowast

119887lowast histogram distances

between the whole region and individual quarters in both1198711and 119871

2metrics variances from sets of 119871lowast119886lowast119887lowast histogram

distances between each pair of quarters in both 1198711and 119871

2

metrics variances from sets of GLCM features (dissimilar-ity maximum probability entropy energy correlation andcontrast) computed for quarters variance of variances of119886lowast channel values (in 119871

lowast

119886lowast

119887lowast color space) calculated for

quarters variance of variances of H channel values (in HSVcolor space) calculated for quarters mean value of variancesof 119886lowast channel values (in 119871

lowast

119886lowast

119887lowast color space) for quarters

variance from sets of 119871lowast119886lowast119887lowast histogram distances betweenhalves in both 119871

1and 119871

2metrics and variance from sets of

GLCM dissimilarity computed for halvesThis particular choice of features does not diverge from

the clinical practice according to which asymmetry is one ofthe most important premises in determining the lesion type

210 Classification

2101 Models The following predictive models have beentested [60 61] 119896-nearest neighbors algorithm (for 119896 isin

3 5 10 15) logistic regression decision tree and supportvector machine (SVM) Other models like neural networks

or rule-based systems have not been a subject of this study[62] In case of the decision tree Gini-Simpson index hasbeen used as a measure of data diversity [63] Decisionboundary of SVMs has been determined using radial basisfunction as kernels Radial basis function has been preferredover linear sigmoid and polynomial kernels as it exhibits afew important properties [24] it allows classifying nonlin-early separable data sets (as is the case in this study) it ischaracterized by high numerical stability and it is definedusing only one parameter Moreover as in this study there areas many as four decision classes the multiclass classificationproblem has been decomposed into a (greater) number ofbinary classification problems Winner-takes-all strategy hasbeen applied to combine solutions of subproblems into afinal solution as for small data sets its results are similarto the results of the best strategymdashpairwise couplingmdashwhileat the same time its computational burden is much lower[64 65]

Individual classifiers have been trained using typicalparameters It has been assumed that the cost of misclassify-ing melanoma as nevi is four times higher than another sortof misclassification Initial experimental results have provenSVM to be the most effective classifier (similar observationshave been made by other research teams [60 66]) Con-sequently further experiments were focused on improvingperformance of SVMs by fine-tuning their parameters

SVMwith radial basis function kernel is governed by twoparametersmdashmisclassification cost 119862 and kernel width 120574The task is to select those parameters in such a way that theaccuracy of predictions for new data (data not used in thelearning process) is maximal

As values of only two parameters have had to be adjustedgrid search has been applied [67] For each parametervalues from exponential series have been considered 119862 isin

2minus5

2minus3

215

and 120574 isin 2minus15

2minus13

23

Each com-bination of parameter values (119862

0 1205740) has been assessed

using a validation procedure described below After the gridsearch had finished SVM have been trained using optimalparameter values (119862lowast 120574lowast)

2102 Validation The aim of the validation step is to assessclassifierrsquos quality based on the number of generalizationerrors that is misclassified samples Optimal SVM modelshave been validated using stratified Monte Carlo cross vali-dation method and in each iteration the test set consisted of10 of samples

The Monte Carlo variant of cross validation had beenchosen for a number of reasons Firstly experimental resultsobtained for a similar problem [24] suggest high effectivenessof this method Secondly in their analysis Molinaro et al[68] point out that for datasets consisting of few sampleswith many features such as a dataset used in this studythe difference in effectiveness between Monte Carlo andldquoclassicrdquo 10-fold cross validation is insignificant The numberof samples drawn into the test set has been determined usinga ldquorule of thumbrdquo [69] Finally by applying Monte Carlocross validation one may avoid constructing overdevel-oped predictive models which decreases the risk of overfit-ting

BioMed Research International 13

As in the data set used in this study there is a considerabledifference in the number of representatives between variousdecision classes and stratification has been applied Stratifica-tion ensures the similar distribution of representatives of eachdecision class in both training set and test set in each iterationof the validation procedure

3 Results and Discussion

31 Database Specification The described algorithm for theautomatic classification of specific melanocytic lesions hasbeen tested on dermoscopic images from a widely usedInteractive Atlas of Dermoscopy [2] Images for this atlashave been provided by two university hospitals (Universityof Naples Italy and University of Graz Austria) and storedon a CD-ROM in the JPEG format The documentation ofeach dermoscopic image was performed using a Dermaphotapparatus (Heine Optotechnik Herrsching Germany) anda photo camera (Nikon F3) mounted on a stereomicroscope(Wild M650 Heerbrugg AG Switzerland) in order to pro-duce digitized ELM images of skin lesions All the imageshave been assessed manually by a dermoscopic expert withan extensive clinical experience

Furthermore all the descriptions of skin cases were basedon the histopathological examination of the biopsy materialIn order to develop and test the automatic procedure for theclassification of melanocytic skin lesions 300 images withdifferent resolutions ranging from 0033 to 05mmpixelwere chosen The database included 100 Clark nevus cases70 blue nevus cases 70 Spitz nevus cases and 60 malignantmelanomas

The preprocessing step (black frame removal and hairremoval) as well as the segmentation step (border error lessthan 6) did not affect the further research [14]

32 Statistical Analysis The performance of a classifier canbe assessed based on the analysis of discrepancies in classifi-cation that is differences between the classification carriedout by the classifier and the actual classification (groundtruth) summarized in a confusion matrix In this matrixeach row refers to actual classes 119888(119909) as recorded in the testset and each column refers to classes as predicted by theclassifier (119909) The (119894 119895)th element contains the number oftest instances predicted by a classifier to belong to 119895th classclass whereas they are actually representatives of 119894th class

From a contingency table we can calculate three perfor-mance indicators accuracy true positive rate (TPR) and truenegative rate (TNR)

For a binary classification problem those indicators aregiven by [70]

Accuracy = 1

Tesum

119909isinTe119868 [ (119909) = 119888 (119909)]

TPR =sum119909isinTe 119868 [ (119909) = 119888 (119909) = oplus]

sum119909isinTe 119868 [119888 (119909) = oplus]

TNR =sum119909isinTe 119868 [ (119909) = 119888 (119909) = ⊖]

sum119909isinTe 119868 [119888 (119909) = ⊖]

(18)

where Te is the test set and the function 119868[sdot] denotes theindicator function Positive and negative classes are denotedby oplus and ⊖ respectively

Although the initial problem is a multiclass classificationproblem (with four classes) it can be decomposed into fourbinary classification problems whether an instance belongsto the given class or not

When dealing with a data set exhibiting class imbalanceproblem the accuracy should be considered only a prelim-inary performance indicator For instance for a data setwith class ratio 99 1 a classifier which simply counts eachinstance as a representative of the majority class would scorethe accuracy of 99 Therefore in such cases better measurewould be a plot of the ROC curve and the area under thatcurve

The ROC plot completely visualizes the (normalized)contingency table by means of plot in the unit squarewith TPR rate on the 119910-axis and TNR rate on the 119909-axisEach of the points marked on the ROC plot specifies theclassification performance in terms of true and false positiverates achieved by the corresponding score thresholds In ourstudy posterior probabilities have been used as scores Thosepoints are connected by straight lines producing a piecewiselinear curve the ROC curve that rises monotonically from(0 0) to (1 1) The ROC plot can be interpreted as a plotof costs-to-benefits ratio To measure the performance of aclassifier using a ROC plot the area under that plot curve(AUC) is calculated

Values of aforementioned performance indicators forall examined classifiers have been summarized in Table 2Additionally Figure 12 presents ROC plots for a malignant-or-benign classification

Among all examined classifiers SVM achieved definitelybest results It achieved highest overall accuracy ACCALL =

09296 whereas for the second-best classifier the logisticregression ACCALL = 08028 The SVM outperformed thelogistic regression in classifying all four lesion types It isto be stressed that the SVM turned out to be the mosteffective classifier in recognizing malignant melanoma themost dangerous of all four lesions Its true positive rate andarea under the curve amounted to TPRMM = 08611 andAUCMM = 09688 respectively A little bit lower true negativerate for the SVM than for the decision tree is not a problemin the studied area It indicates misclassification of Clarkor Spitz nevi as malignant melanoma and the cost of sucha misdiagnosis is much much lower than of classifying amalignant tumor as a benign one

The achieved results are slightly better thanmost results ofmalignant-or-benign studies conducted so far Our methodallowed classifying malignant lesions with TPRMM = 08611

and TNRMM = 09623 whereas for other similar studies TPRfalls within the range 08ndash10 andTNRndash05ndash095 respectively[66 71]

A truly reliable diagnosis can be made only by ahistopathological examination of a lesion When in theirstudies Curley et al asked three experienced dermatologiststo classify lesions based only on the visual examination thosephysicians identified correctly only half of cases Therefore

14 BioMed Research International

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(a) Logit (AUC = 09469)

0

02

04

06

08

1

TPR

02 04 06 08 101 minus TNR

(b) Tree (AUC = 09555)

0

02

04

06

08

1

TPR

104 06 080201 minus TNR

(c) 3NN (AUC = 09152)

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(d) 5NN (AUC = 08746)

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(e) 10NN (AUC = 07860)

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(f) 15NN (AUC = 07378)

02 04 06 080 11 minus TNR

0

02

04

06

08

1

TPR

(g) SVM (AUC = 09688)

Figure 12 ROC plots for individual classifiers for a malignant-or-benign classification

BioMed Research International 15

Table 2 The assessment of classifiers for a 1-vs-all and overall classification

Measure Logit Tree 3NN 5NN 10NN 15NN SVMTPRBN 09730 08485 07200 07660 06939 06735 10000TNRBN 10000 09266 10000 10000 09785 09677 10000AUCBN 10000 09675 09914 09889 09679 09659 10000TPRCN 08611 06977 08519 07586 08148 08182 09143TNRCN 09528 09394 08870 08761 08783 08500 09626AUCCN 09817 09279 09615 09089 09099 09030 09851TPRMM 08387 09655 07419 06786 06071 05185 08611TNRMM 09189 09381 08919 08596 08421 08174 09623AUCMM 09469 09555 09152 08746 07860 07378 09688TPRSN 08421 07568 08235 07105 06053 05227 09429TNRSN 09712 09333 09352 09231 08846 08776 09813AUCSN 09648 09425 09606 09395 08812 08545 09789ACCALL 08803 08028 07746 07324 06761 06197 09296BN blue nevus CN Clark nevus MM malignant melanoma SN Spitz nevus ALL multiclass classification

the presented method of classification yields better resultsthan those obtained during a medical examination

4 Conclusions

This paper presents a computer-aided approach for theautomatic classification of melanocytic lesions and accuratediagnosis of melanoma In our research we evaluated ourapproach with four types of classifiers 119896-nearest neighborsalgorithm logistic regression decision tree and SVM Thedeveloped system achieved 92 accuracy with SVM

Our study gives an important contribution to the researcharea of skin lesion classification for several reasons Firstly itfocuses on specifying the exact type of a skin lesion and notonly on categorization of skin lesions as benign or malignantSecondly this work combines the results of research doneso far related to all the steps needed for the developmentof an automatic diagnostic system for melanocytic lesiondetection and classification Finally the refinement of currentapproaches and development of new techniques andmethodswill help to improve the ability to diagnose skin moles moreprecisely and to achieve the goal of the significant reductionin melanoma mortality rate

41 Future Work Starting from the present frameworkfurther research efforts will be firstly addressed to compareand integrate the very promising approaches and corre-sponding feature descriptors reported in the most recentliterature in order to improve the classification accuracy ofmelanocytic lesions We will also conduct a follow-up studyby collecting more real data especially melanoma cases tofurther evaluate our approach One of the major problems inthe field of melanocytic skin tumors is the underdiagnosisof melanoma as a benign melanocytic or nonmelanocyticlesion To decrease the amount of dermoscopic pitfalls thenumber of melanocytic lesion types will be extended for abetter evaluation of melanocytic lesion

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This scientific work was partly supported by the National Sci-ence Centre based on the Decision no 201101NST706783

References

[1] Skin September 2015 httpsciencenationalgeographiccomsciencehealth-and-human-bodyhuman-bodyskin-article

[2] G Argenziano and H P Soyer Interactive Atlas of DermoscopyBook and CDWeb Resource Edra Medical Publishing and NewMedia Milan Itlay 2000

[3] K Korotkov and R Garcia ldquoComputerized analysis of pig-mented skin lesions a reviewrdquoArtificial Intelligence inMedicinevol 56 no 2 pp 69ndash90 2012

[4] P Soyer G Argenziano R Hofmann-Wellenhof and RH JohrColor Atlas of Melanocytic Lesions of the Skin Springer NewYork NY USA 2007

[5] Melanoma Foundation of New Zealand About melanomakey information 2014 httpwwwmelanomanetworkconzAbout-MelanomaKey-Information

[6] E Stocketh T Rosen and S Shumack Managing Skin CancerSpringer 2010

[7] S W Menzies L Bischof H Talbot et al ldquoThe performanceof SolarScan an automated dermoscopy image analysis instru-ment for the diagnosis of primary melanomardquo Archives ofDermatology vol 141 no 11 pp 1388ndash1396 2005

[8] M Burroni R CoronaGDellrsquoEva et al ldquoMelanoma computer-aided diagnosis reliability and feasibility studyrdquoClinical CancerResearch vol 10 no 6 pp 1881ndash1886 2004

[9] A Masood and A A Al-Jumaily ldquoComputer aided diagnosticsupport system for skin cancer a review of techniques andalgorithmsrdquo International Journal of Biomedical Imaging vol2013 Article ID 323268 22 pages 2013

16 BioMed Research International

[10] M E Celebi W V Stoecker and R H Moss ldquoAdvances inskin cancer image analysisrdquo ComputerizedMedical Imaging andGraphics vol 35 no 2 pp 83ndash84 2011

[11] H Zhou M Chen R Gass et al ldquoFeature-preserving artifactremoval from dermoscopy imagesrdquo in Medical Imaging ImageProcessing vol 6914 of Proceedings of SPIE pp 1ndash9 San DiegoCalif USA February 2008

[12] P Wighton T K Lee and M S Atkins ldquoDermascopic hairdisocclusion using inpaintingrdquo in Medical Imaging Image Pro-cessing vol 6914 ofProceedings of SPIE pp 1ndash8 SanDiego CalifUSA February 2008

[13] Q Abbas I F Garcia M Emre Celebi and W Ahmad ldquoAfeature-preserving hair removal algorithm for dermoscopyimagesrdquo Skin Research and Technology vol 19 no 1 pp e27ndashe36 2013

[14] J Jaworek-Korjakowska ldquoNovel method for border irregularityassessment in dermoscopic color imagesrdquo Computational andMathematicalMethods inMedicine vol 2015 Article ID 49620211 pages 2015

[15] J Jaworek-Korjakowska and R Tadeusiewicz ldquoHair removalfrom dermoscopic color imagesrdquo Bio-Algorithms and Med-Systems vol 9 no 2 pp 53ndash58 2013

[16] M E Celebi H Iyatomi G Schaefer and W V StoeckerldquoLesion border detection in dermoscopy imagesrdquoComputerizedMedical Imaging and Graphics vol 33 no 2 pp 148ndash153 2009

[17] J Jaworek-Korjakowska Analiza i detekcja struktur lokalnych wczerniaku złosliwym [Autoreferat Rozprawy Doktorskiej] AGHUniversity of Science and Technology Krakow Poland 2013

[18] E Zagrouba and W Barhoumi ldquoA prelimary approach for theautomated recognition ofmalignantmelanomardquo ImageAnalysisamp Stereology vol 23 no 2 pp 121ndash135 2004

[19] D Ballardand and C Brown Computer Vision Prentice HallProfessional Technical Reference 1st edition 1982

[20] R M Haralick ldquoA measure for circularity of digital figuresrdquoIEEE Transactions on Systems Man and Cybernetics vol 4 no4 pp 394ndash396 1974

[21] R S Montero and E Bribiesca ldquoState of the art of compactnessand circularity measuresrdquo International Mathematical Forumvol 4 no 27 pp 1305ndash1335 2009

[22] J Smietanski R Tadeusiewicz and E Łuczynska ldquoTextureanalysis in perfusion images of prostate cancermdasha case studyrdquoInternational Journal of Applied Mathematics and ComputerScience vol 20 no 1 pp 149ndash156 2010

[23] J W Choi Y W Park S Y Byun and S W Youn ldquoDifferentia-tion of benign pigmented skin lesions with the aid of computerimage analysis a novel approachrdquo Annals of Dermatology vol25 no 3 pp 340ndash347 2013

[24] M E Celebi H A Kingravi B Uddin et al ldquoA methodologicalapproach to the classification of dermoscopy imagesrdquo Comput-erized Medical Imaging and Graphics vol 31 no 6 pp 362ndash3732007

[25] C Manning P Raghavan and H Schutze Introduction toInformation Retrieval Cambridge University Press 2008

[26] J Shawe-Taylor and N Cristianini Kernel Methods for PatternAnalysis Cambridge University Press Cambridge UK 2004

[27] L Kaufman and P Rousseeuw Finding Groups in Data AnIntroduction to Cluster Analysis Wiley Hoboken NJ USA1990

[28] R M Haralick K Shanmugam and I Dinstein ldquoTexturalfeatures for image classificationrdquo IEEE Transactions on SystemsMan and Cybernetics vol 3 no 6 pp 610ndash621 1973

[29] MM Galloway ldquoTexture analysis using gray level run lengthsrdquoComputer Graphics and Image Processing vol 4 no 2 pp 172ndash179 1975

[30] A Chu C M Sehgal and J F Greenleaf ldquoUse of gray valuedistribution of run lengths for texture analysisrdquo Pattern Recog-nition Letters vol 11 no 6 pp 415ndash419 1990

[31] B V Dasarathy and E B Holder ldquoImage characterizationsbased on joint gray level-run length distributionsrdquo PatternRecognition Letters vol 12 no 8 pp 497ndash502 1991

[32] H Ganster A Pinz R Rohrer E Wildling M Binder and HKittler ldquoAutomated melanoma recognitionrdquo IEEE Transactionson Medical Imaging vol 20 no 3 pp 233ndash239 2001

[33] M DrsquoAmico M Ferri and I Stanganelli ldquoQualitative asym-metry measure for melanoma detectionrdquo in Proceedings of theIEEE International Symposium on Biomedical Imaging Nano toMacro vol 2 pp 1155ndash1158 Arlington Va USA April 2004

[34] W V Stoecker W W Li and R H Moss ldquoAutomatic detectionof asymmetry in skin tumorsrdquo Computerized Medical Imagingand Graphics vol 16 no 3 pp 191ndash197 1992

[35] U Fayyad and K Irani ldquoMulti-interval discretization ofcontinuous-valued attributes for classification learningrdquo inProceedings of the 13th International Joint Conference on ArticialIntelligence vol 2 pp 1022ndash1029 Morgan Kaufmann 1993

[36] S Aksoy and R M Haralick ldquoFeature normalization andlikelihood-based similarity measures for image retrievalrdquo Pat-tern Recognition Letters vol 22 no 5 pp 563ndash582 2001

[37] V Ng and D Cheung ldquoMeasuring asymmetries of skin lesionsrdquoin Proceedings of IEEE International Conference on SystemsMan and Cybernetics vol 5 pp 4211ndash4216 IEEE PressOrlando Fla USA October 1997

[38] Z She Y Liu and A Damatoa ldquoCombination of features fromskin pattern and ABCD analysis for lesion classificationrdquo SkinResearch and Technology vol 13 no 1 pp 25ndash33 2007

[39] B Kusumoputro and A Ariyanto ldquoNeural network diagnosisof malignant skin cancers using principal component analysisas a preprocessorrdquo in Proceedings of the IEEE International JointConference on Neural Networks and the IEEE World Congresson Computational Intelligence vol 1 pp 310ndash315 IEEE PressAnchorage Alaska USA May 1998

[40] E Claridge P N Hall M Keefe and J P Allen ldquoShape analysisfor classification ofmalignantmelanomardquo Journal of BiomedicalEngineering vol 14 no 3 pp 229ndash234 1992

[41] Y Cheng R Swamisai S E Umbaugh et al ldquoSkin lesionclassification using relative color featuresrdquo Skin Research andTechnology vol 14 no 1 pp 53ndash64 2008

[42] K Cheung Image processing for skin cancer detection malignantmelanoma recognition [PhD thesis] University of Toronto 1997

[43] T K Lee D I McLean and M S Atkins ldquoIrregularity indexa new border irregularity measure for cutaneous melanocyticlesionsrdquoMedical Image Analysis vol 7 no 1 pp 47ndash64 2003

[44] T K Lee and E Claridge ldquoPredictive power of irregular bordershapes for malignant melanomasrdquo Skin Research and Technol-ogy vol 11 no 1 pp 1ndash8 2005

[45] Y Chang R J Stanley R H Moss andW Van Stoecker ldquoA sys-tematic heuristic approach for feature selection for melanomadiscrimination using clinical imagesrdquo Skin Research and Tech-nology vol 11 no 3 pp 165ndash178 2005

[46] A J Round AW G Duller and P J Fish ldquoLesion classificationusing skin patterningrdquo Skin Research and Technology vol 6 no4 pp 183ndash192 2000

BioMed Research International 17

[47] Z She and P Fish ldquoAnalysis of skin line pattern for lesionclassificationrdquo Skin Research and Technology vol 9 no 1 pp73ndash80 2003

[48] R P Walvick K Patel S V Patwardhan and A P DhawanldquoClassification of melanoma using wavelet-transform-basedoptimal feature setrdquo in Proceedings of the Medical ImagingImage Processing vol 5370 of Proceedings of SPIE pp 944ndash951May 2004

[49] W V Stoecker C-S Chiang and R H Moss ldquoTexture in skinimages comparison of three methods to determine smooth-nessrdquo Computerized Medical Imaging and Graphics vol 16 no3 pp 179ndash190 1992

[50] S V Deshabhoina S E Umbaugh W V Stoecker R H Mossand S K Srinivasan ldquoMelanoma and seborrheic keratosisdifferentiation using texture featuresrdquo Skin Research and Tech-nology vol 9 no 4 pp 348ndash356 2003

[51] G M Weiss ldquoMining with rarity a unifying frameworkrdquo ACMSIGKDD Explorations Newsletter vol 6 no 1 pp 7ndash19 2004

[52] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

[53] H Liu and L Yu ldquoToward integrating feature selection algo-rithms for classification and clusteringrdquo IEEE Transactions onKnowledge and Data Engineering vol 17 no 4 pp 491ndash5022005

[54] I T Jolliffe Principal Component Analysis Springer New YorkNY USA 1986

[55] I Kononenko and E Simec ldquoInduction of decision treesusing RELIEFFrdquo in Proceedings of the ISSEK94 Workshop onMathematical and Statistical Methods in Artificial Intelligencevol 363 of International Centre forMechanical Sciences pp 199ndash220 Springer Vienna Austria 1995

[56] R Battiti ldquoUsing mutual information for selecting features insupervised neural net learningrdquo IEEE Transactions on NeuralNetworks vol 5 no 4 pp 537ndash550 1994

[57] M A Hall ldquoCorrelation-based feature selection for discreteand numeric class machine learningrdquo in Proceedings of the 17thInternational Conference on Machine Learning (ICML rsquo00) pp359ndash366 Morgan Kaufmann 2000

[58] E E Ghiselli Theory of Psychological Measurement McGraw-Hill New York NY USA 1964

[59] I H Witten and E Frank Data Mining Practical MachineLearning Tools and Techniques Morgan Kaufmann San Fran-cisco Calif USA 2005

[60] S Dreiseitl L Ohno-Machado H Kittler S Vinterbo HBillhardt and M Binder ldquoA comparison of machine learningmethods for the diagnosis of pigmented skin lesionsrdquo Journal ofBiomedical Informatics vol 34 no 1 pp 28ndash36 2001

[61] A Glowacz A Glowacz and P Korohoda ldquoRecognition ofmonochrome thermal images of synchronous motor with theapplication of binarization and nearestmean classifierrdquoArchivesof Metallurgy and Materials vol 59 no 1 pp 31ndash34 2014

[62] R Tadeusiewicz How Intelligent Should be System for ImageAnalysisvol 339 of Studies in Computational IntelligenceSpringer Berlin Germany 2011

[63] L Jost ldquoEntropy and diversityrdquo Oikos vol 113 no 2 pp 363ndash375 2006

[64] K-B Duan and S S Keerthi ldquoWhich is the bestmulticlass SVMmethod An empirical studyrdquo inMultiple Classifier Systems 6thInternational Workshop MCS 2005 Seaside CA USA June 13ndash15 2005 Proceedings vol 3541 of Lecture Notes in ComputerScience pp 278ndash285 Springer Berlin Germany 2005

[65] C Hsu and C Lin ldquoA comparison of methods for multi-class support vector machinesrdquo IEEE Transactions on NeuralNetworks vol 13 no 2 pp 415ndash425 2002

[66] A A Safi Towards computer-aided diagnosis of pigmented skinlesionson [PhD thesis] Technische Universitat Munchen 2012

[67] C-W Hsu C-C Chang and C-J Lin ldquoA practical guide tosupport vector classificationrdquo Tech Rep National Taiwan Uni-versity 2003 httpwwwcsientuedutwsimcjlinpapersguideguidepdf

[68] A M Molinaro R Simon and R M Pfeiffer ldquoPrediction errorestimation a comparison of resamplingmethodsrdquo Bioinformat-ics vol 21 no 15 pp 3301ndash3307 2005

[69] R Bouckaert ldquoChoosing between two learning algorithmsbased on calibrated testsrdquo in Proceedings of the 20th Interna-tional Conference on Machine Learning (ICML rsquo03) T Fawcettand N Mishra Eds pp 51ndash58 AAAI Press Washington DCUSA 2003

[70] P Flach Machine Learning The Art and Science of AlgorithmsThat Make Sense of Data Cambridge University Press NewYork NY USA 2012

[71] P Rubegni G Cevenini M Burroni et al ldquoAutomated diagno-sis of pigmented skin lesionsrdquo International Journal of Cancervol 101 no 6 pp 576ndash580 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 4: Research Article Automatic Classification of Specific ...downloads.hindawi.com/journals/bmri/2016/8934242.pdfmost popular melanocytic nevi to the diagnostic categories including Clark

4 BioMed Research International

Figure 4 Examples of blue nevus based on [2]

Figure 5 Examples of SpitzReed nevus based on [2]

many cases with a high degree of certainty As blue nevi areharmless no treatment is needed

SpitzReed Nevus SpitzReed nevi are typically small well-circumscribed reddish papules larger reddish plaques butalso verrucous plaques with variegated colors and may beup to one or two centimeters in diameter Dermoscopicallyabout 50 of pigmented Spitz nevi show a symmetricappearance and a characteristic starburst pattern or a globularpattern with a regular discrete blue pigmentation in thecenter and a characteristic rim of large brown globules at theperiphery (Figure 5) Spitz nevi are well-known simulatorsof cutaneous melanoma from a clinical dermoscopic andhistopathologic point of view [2]

Malignant Melanoma Malignant melanoma is the mostaggressive type of skin cancer It is due to the uncontrolledgrowth of melanocytes The first sign of melanoma is usuallyan unusual looking mole or spot Melanomamay be detectedat an early stage when melanocytic lesions are only a fewmillimeters in diameter but they also may grow up to severalcentimeters in diameter before being diagnosed

Clinically a melanoma can have a variety of colorsincluding white brown black blue red or even light greyMelanomas in early stage are usually small more or lessirregularly shaped and outlined macules or slightly elevatedplaques with pigmentation that varies from pink to dark

brown or black Invasive melanomas are as a rule papular ornodular often ulcerated and characteristically exhibit shadesof brown and black but also foci of red white or blue col-oration Dermoscopic features describing melanoma containmulticomponent pattern irregular dotsglobules atypicalpigment network irregular streaks irregular pigmentationregression structures and blue whitish veil (Figure 6) Asuspected melanoma should be surgically removed with a2-3mm margin (excision biopsy) and sent to a pathologylaboratory for a microscopic examination [2]

Related Works During patient examinations researchersobserve that young inexperienced dermatologists and fam-ily physicians have huge difficulties in the correct visualassessment of skin lesions As stated in [7 8] only expertshave arrived at 90 sensitivity and 59 specificity in skinlesion diagnosis while for unexperienced physicians thesefigures show a significant drop till around 62-63 for generalpractitioners [9] Due to these obstacles researchers tryto implement and build computer-aided diagnosis (CAD)systems for automated diagnosis of melanoma to increasethe specificity and sensitivity and to simplify the assessmentof skin moles Two reviews on the state of the art of CADsystems for skin lesion diagnosis can be found in [3 10]For these systems the true positive rate ranges between08 and 10 and the true negative rate between 05 and095 Although such computer programs are developed for

BioMed Research International 5

Figure 6 Examples of malignant melanoma based on [2]

Single lesionanalysis

Preprocessing Segmentation

(i) Black frame removal(ii) Hair detection

(iii) Hair inpaintingFeature

extraction(i) Geometrical feature

(ii) Color-based feature(iii) Texture description(iv) Asymmetry

Feature selectionclassification

(i) Black frame removal(ii) Hair detection

(iii) Hair inpaintingFeature

extraction

Figure 7 The proposed algorithm for the classification of melanocytic lesions based on dermoscopic color images

Table 1 A categorization of feature descriptors commonly used inthe computerized analysis of dermoscopic images

Clinicalfeatures Feature descriptors References

Asymmetry Symmetry distance [37]Lesionrsquos centroid [38]

Borderirregularity

Fourier feature [39]Fractal geometry [40]Area and perimeter [38 41 42]Irregularity index [43 44]

Colorvariegation RGB statistical descriptors [38 45]

Diameter Semimajor axis of the ellipse [38]

Otherfeatures

Pattern analysis [38 46 47]Wavelet-based descriptors [48]Texture descriptors [49]Intensity distribution descriptors [5]Haralick descriptors [50]

different diagnostic algorithms to the best of our knowledgea CAD system to classify differentmelanocytic lesions has notyet been proposed Based on the review article [3] in Table 1we present an extended categorization of feature descriptorswhich are commonly used in the computerized analysis ofdermoscopic images

2 Materials and Methods

The designed system for the automated diagnosis of melan-ocytic lesions is a computer-aided diagnosis system which isdesigned to reproduce the decision of dermatologist based onthe dermoscopy images The proposed methodology of dis-crimination between malignant melanoma and nevi tumorsis shown in Figure 7The automated system is divided into sixmain stages including preprocessing (image enhancement)segmentation feature extraction feature selection classifica-tion and then evaluationThe system enables texture analysiswithout being limited by selection and detection of structureof interest

In this section the preprocessing and segmentation stepsare described shortly based on our previous work whilethe main stage which is feature extraction selection andclassification is presented in detail The application has beenimplemented using Matlab ver 2013

21 Dermoscopic Image Preprocessing The main goal ofthe preprocessing step is to improve the image quality byreducing or even removing the unrelated and surplus parts inthe dermoscopic imagesDermoscopic images are inhomoge-neous and complex For dermoscopy images the preprocess-ing step is obligatory because of extraneous artifacts such asskin lines air bubbles and hairs which appear in virtuallyevery image

To enhance the image and to reduce the influence oflight hairs air bubble small pores shines and reflections amedian filter is being used

6 BioMed Research International

(a) (b)

(c) (d)

(e) (f)

Figure 8 Outcome of the preprocessing step (a) input image (b) black frame removal (c) grayscale conversion (d) top-hat transform andbinarization process (e) hair distinction from other structures and (f) inpainting

Among the most necessary artifact rejection steps is hairremoval because hairs may cover parts of the image andmake the segmentation and texture analysis impossible Anumber of methods have been developed for hair removalin dermoscopic images and they were mostly based onmorphological operations and adaptive thresholding [11ndash13]A good approach for hair removal is the use of top-hattransformationThe process consists of four steps convertingRGB to grayscale image applying black top-hat transforma-tion distinguishing hairs from other local structures andinpainting

The dermoscopic RGB image is being converted intograyscale with the NTSC 1953 standard (Figure 8(c)) Sec-ondly the black top-hat transform which is a morphologicalimage processing technique is used to detect thick darkhairs The result of this step is the difference between theclosing operation and the input image

119879119908(119868) = 119868 ∘ 119887 minus 119868 (1)

where ∘ denotes the closing operation 119868 is the grayscale inputimage and 119887 is a grayscale structuring element [14 15]

The black top-hat transform returns an image containingelements that are darker than their surrounding and smallerthan the structuring element

These steps have been precisely described in our work [1415] Figure 8 presents the outcome after each step

22 Segmentation Algorithm The techniques and algorithmsavailable for segmentation of medical structures are specificto application imaging modality and type of body part to bestudied For the dermoscopic images segmentation processis one of the most challenging and crucial processes Thisprocess for dermoscopic images is extremely difficult dueto several factors low contrast between the healthy skinand moles variegate coloring inside lesions and irregularborders as well as different artifacts Due to the difficultiesdescribed above numerousmethods have been implementedand tested [3] Celebi et al present in their research [16]

BioMed Research International 7

Health skin segmentation

(a) (b) (c)

(d) (e)

Figure 9 Results of the segmentation process (andashd) segmentation of the healthy skin with the region-growing algorithm (e) segmentedarea

the state of the art of segmentation methods and comparethem with the statistical region merging as a recent colorimage segmentation technique based on region growingand merging Based on the achieved results and conductedexperiments the skin lesion extraction is performed byseeded region-growing algorithm [17] in regard to twoaspects Firstly during the preprocessing step the healthyskin becomes homogeneous Secondly the whole skinmole isvisible in the dermoscopic image and surrounded by healthyskin It means that the healthy skin surrounds the moleRegion-growing techniques generally give better results innoisy images where edges are extremely difficult to detect

For the skin lesion segmentation we take one seed whichis located in the left upper corner of the image (Figure 9(a))The region is iteratively grown by comparing all unallocatedneighboring pixels to the regionThe region-growing processconsists of picking a seed from the set investigating all4-connected neighbors of this seed and merging suitableneighbors to the seed The seed is then removed and allmerged neighbors are added to the seed set The region-growing process continues until the seed set is empty

These steps have been precisely described in our work[14] In Figure 9 we present the results of the segmentationstep for several iterations of the region-growing algorithm

23 Geometrical Feature Extraction Geometrical featureshave been used mainly to describe lesionrsquos outline as itsirregularity usually indicates malignancy Those features arebased mostly on such properties of an object as areadiameters or geometric moments To ensure robustness ofthe selected features their formulation does not dependdirectly on objectrsquos perimeter (as in case of raster images itis hard to accurately estimate this quantity) All features arenormalized to ensure their scale- rotation- and translation-invariance

To assess lesionrsquos shape the following features havebeen extracted maximal diameter (maximum distancebetween two arbitrary points) equivalent diameter (119863eq =

radic(4120587) sdot Area) variance of the radial distance distributionrectangularity (ratio of area of an object to area of itsbounding box) elongation (aspect ratio of objectrsquos boundingbox) eccentricity Haralickrsquos compactness and normalizeddiscrete compactness [18ndash21]

Variance of the radial distance distribution is given by [18]

119904119889

2

=1

card (119861)sum119901isin119861

(119889 (119901 119862) minus 119889 (119901 119862))2

119889 (119901 119862)2

(2)

where 119861 is a set of all pixels constituting perimeter of object119874 119862 is geometric centroid of 119874 and 119889(119886 119887) is the distancebetween pixels 119886 and 119887 in a chosen metric As studied lesionsare of circular shape the Euclidean metric has been chosen

Eccentricity measures deviation of a conic curve from acircle and (in mathematics) is formulated as ratio of distancebetween foci of an ellipse to the length of its major axisIn image processing eccentricity of an object 119874 is actuallyvalue of eccentricity of an ellipse with same second momentsas object 119874 [19]

120576 =(11989802minus 11989820)2

+ 411989811

(11989802+ 11989820)2

(3)

where 119898119901119902

denotes an image moment of (119901 119902) order How-ever such a formulation of eccentricity still serves the pur-pose of measuring circularity of an object

Haralickrsquos compactness is a measure for circularity of adigital figure defined as 120583

119877120590119877 where 119877 is a random variable

of the distance between the center of the figure to any part ofits perimeter [20]

8 BioMed Research International

Normalized discrete compactness 119862119863119873

measure is basedon counting the number of cell sides common to adjacentpixels of objectrsquos perimeter a measure called discrete com-pactness 119862

119863[21]

119862119863119873

=119862119863minus 119862119863min

119862119863max minus 119862

119863

=119875 minus 2119899 + 2

119875 + 4radic119899

119862119863=4119899 minus 119875

2 119862119863min = 119899 minus 1 119862

119863max =4119899 minus 4radic119899

2

(4)

where119862119863min119862119863max are respectively lower and upper bound

of discrete compactness of a shape composed of 119899 pixels and119875 is the perimeter of the digital region [21]

24 Color-Based Features The analysis of lesionrsquos colors is animportant source of information when determining lesionrsquostype as malignant lesions are characterized by a rich texture[22]

The following color-based features have been extractednumber of colors present within lesionrsquos area (together withinformation about the presence of two specific colors whiteand black) concentricity centroid distance and 119871

lowast

119886lowast

119887lowast

histogram distances [23 24]To determine the number of colors and to calculate the

concentricity an image has been converted to CIE 119871lowast

119886lowast

119887lowast

color space and then 119886lowast and 119887

lowast color channels have beenclustered into four clusters Three clustering algorithms havebeen tested 119896-means clustering kernel 119896-means clusteringand hierarchical agglomerative clustering [25 26] Kernel119896-means algorithm has been tested using Gaussian kernelfor 120590 isin 1 2 [26] In case of hierarchical agglomerativeclustering Wardrsquos minimum variance method based onEuclidean distance between clusters has been applied as thecriterion for choosing the pair of clusters to merge at eachstep [27]

The number of colors has been related to the greatestdistance between clustersrsquo centroids (each pair of clusters hadbeen considered) and it has been assumed that to identify twocolors as significantly different the distance between them in119886lowast

119887lowast space must exceed certain threshold value

119899colors = max(1 lfloor1120591max (119863)rfloor)

119863 = 119889119864(1198881 1198882) 1198881 1198882isin 119862 and 119888

1= 1198882

(5)

where 119862 is a set of clustersrsquo centroids The threshold value 120591has been set to 120591 = 12

It has been assumed that to state the presence of a certaincolor within the lesion the area of biggest cohesive area ofthat color must be greater than 1 of the area of the wholelesion Pixels have been recognized as white if 119871lowast gt 65 and asblack if 119871lowast lt 15 where 119871lowast is the value of luminosity channelfor the given pixel

Based on the aforementioned clusterization a measurecalled ldquoconcentricityrdquo is derived [23] To compute concen-tricity of an object its color clusters are first arranged inascending order regarding the area of their convex hull(ie segment 119878

1has the smallest area of convex hull and

segment 1198784the greatest) and then a few auxiliary indexes are

calculated The 119899(sdot) function stands for the number of pixelsof the given object

For segment 1198781 the percent area index PA is computed

[23]

PA =119899 (119862119862max)

119899 (1198781)

(6)

where 119862119862max is the largest cohesive set of pixels of 1198781 The PAdescribes how well the core is grouped

Let Hull denote the minimal convex area which includesthe entire pixels of 119878

2 The core inclusion CI is then defined

as [23]

CI =119899 (1198781capHull)

119899 (1198781)

(7)

The CI describes how well the second smallest area encirclesthe core

Let Core denote the minimal convex area which includesthe entire pixels of 119878

1 The hull exclusion HE is given by [23]

HE = 1 minus119899 (1198782cap Core)

119899 (1198782)

(8)

TheHEdescribes how exclusively the hull surrounds the coreFinally concentricity is defined as the product of the three

variables [23]

Concentricity = PA times CI timesHE

Concentricity isin [0 1]

(9)

Concentricity closer to one implies a better concentric struc-ture

Images of all regions used to compute concentricity havebeen compiled in Figure 10

The centroid distance for a color channel is defined asthe distance between the geometric centroid (of the binaryobject) and the brightness centroid of that channel Thebrightness centroid may be considered a center of mass ofan object whose density is determined by intensity valuesof its pixels If the pigmentation in a particular channel ishomogeneous the brightness centroid will be close to thegeometric centroid resulting in small value of the centroidaldistance

Color similarity of two regions has been quantified by the1198711- and 119871

2-norm histogram distances for CIE 119871

lowast

119886lowast

119887lowast color

space coarsely quantized into 4 times 8 times 8 bins [24]

1198711(119867119860 119867119861) =

4times8times8

sum

119894=1

1003816100381610038161003816119867119860 (119894) minus 119867119861(119894)1003816100381610038161003816

1198712(119867119860 119867119861) =

4times8times8

sum

119894=1

(119867119860(119894) minus 119867

119861(119894))2

(10)

where119867119860119867119861are histograms of areas 119860 and 119861 respectively

BioMed Research International 9

Object S1 S2 CCmax

Core S1 cap core Hull S2 cap hull

Figure 10 Regions used to compute concentricity of an object

25 Texture Description Quantitative properties of lesionrsquostexture were described with measures based on a gray levelcooccurrence matrix (GLCM) and a gray level run-lengthmatrix (GLRLM) [28 29]

GLCM is a square matrix 119875 where the (119894 119895)th entry of 119875represents information about the frequency of occurrence ofsuch two adjacent pixels where one of them has intensity 119894

and another has intensity 119895 Such informationmay be used toderive second-order statistical measurements characterizingexamined texture GLCM and six measures derived from it(contrast correlation energy homogeneity maximum prob-ability and dissimilarity) have been calculated as describedby Celebi et al [24]

Contrast = sum

119894119895

(119894 minus 119895)2

119875119894119895

Correlation = minussum

119894119895

(119894 minus 120583119909) (119895 minus 120583

119910)

120590119909120590119910

119875119894119895

Energy = sum

119894

sum

119895

[119875119894119895]2

Homogeneity = sum

119894

sum

119895

119875119894119895

1 + (119894 minus 119895)2

Maximum Probability = max119894119895

119875119894119895

Dissimilarity = sum

119894

sum

119895

119875119894119895sdot1003816100381610038161003816119894 minus 119895

1003816100381610038161003816

(11)

With GLRLM it is possible to analyze higher orderstatistical features for the given texture GLRLM is a two-dimensional matrix in which each element 119901(119894 119895 | 120579) gives

the total number of occurrences of runs of length 119895 at graylevel 119894 in a given direction 120579

In our study the direction of runs is irrelevant asdermoscopic camera has no fixed orientation when takingimages of lesionsmdashthere is no ldquoreference orientationrdquo Theonly thing that matters is texturersquos homogeneity thus insteadof calculating measures for individual orientations 120579 isin

0∘

45∘

90∘

135∘

all GLRLM computed for the aforemen-tioned orientations have been added together and mea-sures have been computed only using this new orientation-invariant matrix

GLRLM is used to derive 11measures Five basicmeasures(SRE LRE GLN RLN and RP) describe the distribution ofrunsrsquo lengths (SRE LRE RLN and RP) and runsrsquo intensity(GLN) [29] As SRE and LRE do not consider pixelsrsquo intensityLGRE andHGREmeasures have been proposed [30] FinallySRLGE SRHGE LRLGE and LRHGEmeasures are based onstatistical properties of a joint probability distribution of bothrunsrsquo intensity and length [31]

SRE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

119894119895

1198952

LRE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198952

119894119895

GLN =1

119899119903

119866

sum

119894=1

(

119877

sum

119895=1

119894119895)

2

RLN =1

119899119903

119877

sum

119895=1

(

119866

sum

119894=1

119894119895)

2

10 BioMed Research International

RP =119899119903

119899

LGRE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

119894119895

1198942

HGRE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198942

119894119895

SRLGE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

119894119895

1198942 sdot 1198952

SRHGE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198942

119894119895

1198952

LRLGE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198952

119894119895

1198942

LRHGE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198942

1198952

119894119895

(12)

where 119899 is the number of image pixels with 119866 gray levels 119877 isthe maximal run length and 119899

119903is the total number of runs in

an image (119899119903= sum119866

119894=1sum119877

119895=1119894119895)

26 Asymmetry The observed asymmetry is a significantdiagnostic premise as in malignant lesions the arrangementof local structures (eg dots streaks and pigmentation nets)is nonuniform across the whole area of a lesion [32] Someresearchers point out that for instance sharp transitionsbetween central and border area indicatemalignancy [24 33]

Asymmetry measures may be defined in terms of ratiosof feature values across various lesionrsquos area segments or interms of changes of feature values between halves obtainedby splitting lesion area into halves using straight section linespassing through the center of mass [32 33]

In our study a different approach has been adopted Thelesion area has been first divided into sets of subregions inthree differentmanners (Figure 11) by splitting (1) into centraland border part (2) into halves along minor and major axisand (3) into quarters using same axes Then a variance offeature values has been calculated for each set of subregionsAdditionally asymmetry indices proposed by Stoecker et al[34] and the following quantities have been computed

119877119876119894

=119891119876119894

119891119882

(13)

where 119891119876119894

denotes value of feature 119891 calculated for 119894thquadrant and 119891

119882for the whole area respectively

27 Data Preparation In order to apply correlated-basefeature selection method (the choice of this particularmethod is discussed later) numerical features have beendiscretizedThis is due to the fact that in the studied problem

the decision feature is a categorical variable which meansit takes on discrete values and the relationship betweenvariables of those two classes cannot be directly assessedTheonly solution is to discretize the variable taking continuousvalues In general terms discretization is a simple logicalcondition which takes into account one or a few attributesand which aims at splitting a range of data into at least twosubsets Data in our study have been discretized accordingto the method proposed by Fayyad and Irani [35] whichuses heuristics minimizing entropy based on the ldquominimumdescription length principlerdquo

To ensure correct work of algorithms based on analysis ofdistances between points in the feature space (eg 119896-nearestneighbors algorithm SVM) for each feature its values havebeen scaled to the range of [minus1 1] Consequently the riskof a situation in which a feature with larger range of valuesdominates other features has been eliminated Normallydistributed features have been transformed using 119885-score(Equation (14)) whereas other features were rescaled linearlyto the desired range (15) [36] Consider

x =x minus x3119904

where x =1

119873sum

119894

x119894 119904 =

1

119873 minus 1(sum

119894

(x119894minus 120583)2

)

(14)

x = 2x119894minus xmin

xmax minus xminminus 1 (15)

where x is a vector of featurersquos values prior to normalizationand x is the same vector after normalization To determineif a given feature is well modeled by a normal distributionchi-square goodness-of-fit test has been applied

28 Class Imbalance Problem The data set used in thisstudy exhibits class imbalance problem which means thatthe classes are not approximately equally represented (ie itconsists of more than three times as many images of Clarknevus than images of blue nevus) In such case most clas-sifiers will focus on learning how to identify representativesof majority classes leading to poor predictive accuracy forminority classes When making a diagnosis such a situationis unacceptable as misclassification error may lead even topatientrsquos death

The most common way to address this issue is sampling[51] There exist two main types of sampling methodsundersampling which consists in removing representativesof the majority class and oversampling which consists inadding examples of the minority class Out of many samplingtechniques two methods are particularly popular randomundersampling and synthetic minority oversampling technique(SMOTE) In random undersampling method randomlydrawn representatives of the majority class are removed fromthe data set In SMOTE method new ldquosyntheticrdquo examplesare created by averaging a few representatives of the sameminority classes which are nearest neighbors in the featurespace [52]

BioMed Research International 11

(a) Whole area

(b) Inner and outer area

(c) Split along the major axis (d) Split along the minor axis

(e) Split along both major and minor axis simultaneously

Figure 11 A sample division of an object into subregions

In this study SMOTE method has been used as it hasbeen proven to be more effective than random undersam-pling when tested on a set of dermoscopic images obtainedfrom various sources without any common or rare lesiontypes omitted [24]

29 Feature Selection Feature selection consists in reducingdata dimensionality by rejecting redundant unimportant ornoisy features thus resulting in increased prediction accu-racy less complex classifier models and better computationefficiency

Feature selection algorithms may be divided into threemain categories filters wrappers and projections [53 54] Inthis study filter methods have been used for two reasons[24] Firstly as filters are usually fast it is possible to test

a huge number of combinations of features Secondly if onewould like to use wrappers on a given data set the targetlearning algorithm should demonstrate satisfactory resultsfor the original data set as wrappers are based on feedbackprinciple As some features extracted in this study might beirrelevant or redundant as well as due to class imbalancewrappers have not been likely to fulfill those restrictionsOn the other hand projections are used mainly to increasecomputational performance which was not the case in thisstudy

Out of numerous available filters three areworth noticingdue to their satisfactory performance on various data sets[53] ReliefFmutual information-based feature selection andcorrelation-based feature selection [55ndash57] As numerous fea-tures extracted in this study are strongly mutually correlated

12 BioMed Research International

correlation-based feature selection has been chosen as theonly method which takes into account not only relationshipsbetween features and the decision class but also relationshipsbetween features themselves

The heuristic ldquomeritrdquo Merit119878of a feature subset 119878 contain-

ing 119896 features is given by [58]

Merit119878=

119896119903cf

radic119896 + 119896 (119896 minus 1) 119903ff (16)

where 119903cf is the average feature-class correlation and 119903ff theaverage feature-feature intercorrelation

To compare two feature vectors of discrete values thesymmetric uncertainty an improved version of informationgain measure has been used [59] Symmetric uncertainty isgiven by

SU (119883 119884) = 2 [119867 (119883) + 119867 (119884) minus 119867 (119883 119884)

119867 (119884) + 119867 (119883)] (17)

where 119867(119860) is the marginal entropy of set 119860 and 119867(119860 119861) isthe joint entropy of sets 119860 and 119861

The number of selected features is a parameter whichrequires a careful tuning too few features results may pre-vent classifiers from distinguishing between various classeswhereas too many features impose risk of overfittingmdashasituation when a model excels in classifying training databut fails to generalize knowledge and hence misclassifies newsamples As some researchers suggest limiting the number offeatures 119896 to the range of 5 le 119896 le 30 [24] in this study 119896 = 20

has been adoptedThe applied procedure allowed choosing features which

best discriminate lesion types concentricity computed using119896-means algorithm concentricity computed using kernel 119896-means algorithm for 120590 = 1 119871lowast119886lowast119887lowast histogram distancesin 1198711and 119871

2metrics computed for central and border

part variances from sets of 119871lowast

119886lowast

119887lowast histogram distances

between the whole region and individual quarters in both1198711and 119871

2metrics variances from sets of 119871lowast119886lowast119887lowast histogram

distances between each pair of quarters in both 1198711and 119871

2

metrics variances from sets of GLCM features (dissimilar-ity maximum probability entropy energy correlation andcontrast) computed for quarters variance of variances of119886lowast channel values (in 119871

lowast

119886lowast

119887lowast color space) calculated for

quarters variance of variances of H channel values (in HSVcolor space) calculated for quarters mean value of variancesof 119886lowast channel values (in 119871

lowast

119886lowast

119887lowast color space) for quarters

variance from sets of 119871lowast119886lowast119887lowast histogram distances betweenhalves in both 119871

1and 119871

2metrics and variance from sets of

GLCM dissimilarity computed for halvesThis particular choice of features does not diverge from

the clinical practice according to which asymmetry is one ofthe most important premises in determining the lesion type

210 Classification

2101 Models The following predictive models have beentested [60 61] 119896-nearest neighbors algorithm (for 119896 isin

3 5 10 15) logistic regression decision tree and supportvector machine (SVM) Other models like neural networks

or rule-based systems have not been a subject of this study[62] In case of the decision tree Gini-Simpson index hasbeen used as a measure of data diversity [63] Decisionboundary of SVMs has been determined using radial basisfunction as kernels Radial basis function has been preferredover linear sigmoid and polynomial kernels as it exhibits afew important properties [24] it allows classifying nonlin-early separable data sets (as is the case in this study) it ischaracterized by high numerical stability and it is definedusing only one parameter Moreover as in this study there areas many as four decision classes the multiclass classificationproblem has been decomposed into a (greater) number ofbinary classification problems Winner-takes-all strategy hasbeen applied to combine solutions of subproblems into afinal solution as for small data sets its results are similarto the results of the best strategymdashpairwise couplingmdashwhileat the same time its computational burden is much lower[64 65]

Individual classifiers have been trained using typicalparameters It has been assumed that the cost of misclassify-ing melanoma as nevi is four times higher than another sortof misclassification Initial experimental results have provenSVM to be the most effective classifier (similar observationshave been made by other research teams [60 66]) Con-sequently further experiments were focused on improvingperformance of SVMs by fine-tuning their parameters

SVMwith radial basis function kernel is governed by twoparametersmdashmisclassification cost 119862 and kernel width 120574The task is to select those parameters in such a way that theaccuracy of predictions for new data (data not used in thelearning process) is maximal

As values of only two parameters have had to be adjustedgrid search has been applied [67] For each parametervalues from exponential series have been considered 119862 isin

2minus5

2minus3

215

and 120574 isin 2minus15

2minus13

23

Each com-bination of parameter values (119862

0 1205740) has been assessed

using a validation procedure described below After the gridsearch had finished SVM have been trained using optimalparameter values (119862lowast 120574lowast)

2102 Validation The aim of the validation step is to assessclassifierrsquos quality based on the number of generalizationerrors that is misclassified samples Optimal SVM modelshave been validated using stratified Monte Carlo cross vali-dation method and in each iteration the test set consisted of10 of samples

The Monte Carlo variant of cross validation had beenchosen for a number of reasons Firstly experimental resultsobtained for a similar problem [24] suggest high effectivenessof this method Secondly in their analysis Molinaro et al[68] point out that for datasets consisting of few sampleswith many features such as a dataset used in this studythe difference in effectiveness between Monte Carlo andldquoclassicrdquo 10-fold cross validation is insignificant The numberof samples drawn into the test set has been determined usinga ldquorule of thumbrdquo [69] Finally by applying Monte Carlocross validation one may avoid constructing overdevel-oped predictive models which decreases the risk of overfit-ting

BioMed Research International 13

As in the data set used in this study there is a considerabledifference in the number of representatives between variousdecision classes and stratification has been applied Stratifica-tion ensures the similar distribution of representatives of eachdecision class in both training set and test set in each iterationof the validation procedure

3 Results and Discussion

31 Database Specification The described algorithm for theautomatic classification of specific melanocytic lesions hasbeen tested on dermoscopic images from a widely usedInteractive Atlas of Dermoscopy [2] Images for this atlashave been provided by two university hospitals (Universityof Naples Italy and University of Graz Austria) and storedon a CD-ROM in the JPEG format The documentation ofeach dermoscopic image was performed using a Dermaphotapparatus (Heine Optotechnik Herrsching Germany) anda photo camera (Nikon F3) mounted on a stereomicroscope(Wild M650 Heerbrugg AG Switzerland) in order to pro-duce digitized ELM images of skin lesions All the imageshave been assessed manually by a dermoscopic expert withan extensive clinical experience

Furthermore all the descriptions of skin cases were basedon the histopathological examination of the biopsy materialIn order to develop and test the automatic procedure for theclassification of melanocytic skin lesions 300 images withdifferent resolutions ranging from 0033 to 05mmpixelwere chosen The database included 100 Clark nevus cases70 blue nevus cases 70 Spitz nevus cases and 60 malignantmelanomas

The preprocessing step (black frame removal and hairremoval) as well as the segmentation step (border error lessthan 6) did not affect the further research [14]

32 Statistical Analysis The performance of a classifier canbe assessed based on the analysis of discrepancies in classifi-cation that is differences between the classification carriedout by the classifier and the actual classification (groundtruth) summarized in a confusion matrix In this matrixeach row refers to actual classes 119888(119909) as recorded in the testset and each column refers to classes as predicted by theclassifier (119909) The (119894 119895)th element contains the number oftest instances predicted by a classifier to belong to 119895th classclass whereas they are actually representatives of 119894th class

From a contingency table we can calculate three perfor-mance indicators accuracy true positive rate (TPR) and truenegative rate (TNR)

For a binary classification problem those indicators aregiven by [70]

Accuracy = 1

Tesum

119909isinTe119868 [ (119909) = 119888 (119909)]

TPR =sum119909isinTe 119868 [ (119909) = 119888 (119909) = oplus]

sum119909isinTe 119868 [119888 (119909) = oplus]

TNR =sum119909isinTe 119868 [ (119909) = 119888 (119909) = ⊖]

sum119909isinTe 119868 [119888 (119909) = ⊖]

(18)

where Te is the test set and the function 119868[sdot] denotes theindicator function Positive and negative classes are denotedby oplus and ⊖ respectively

Although the initial problem is a multiclass classificationproblem (with four classes) it can be decomposed into fourbinary classification problems whether an instance belongsto the given class or not

When dealing with a data set exhibiting class imbalanceproblem the accuracy should be considered only a prelim-inary performance indicator For instance for a data setwith class ratio 99 1 a classifier which simply counts eachinstance as a representative of the majority class would scorethe accuracy of 99 Therefore in such cases better measurewould be a plot of the ROC curve and the area under thatcurve

The ROC plot completely visualizes the (normalized)contingency table by means of plot in the unit squarewith TPR rate on the 119910-axis and TNR rate on the 119909-axisEach of the points marked on the ROC plot specifies theclassification performance in terms of true and false positiverates achieved by the corresponding score thresholds In ourstudy posterior probabilities have been used as scores Thosepoints are connected by straight lines producing a piecewiselinear curve the ROC curve that rises monotonically from(0 0) to (1 1) The ROC plot can be interpreted as a plotof costs-to-benefits ratio To measure the performance of aclassifier using a ROC plot the area under that plot curve(AUC) is calculated

Values of aforementioned performance indicators forall examined classifiers have been summarized in Table 2Additionally Figure 12 presents ROC plots for a malignant-or-benign classification

Among all examined classifiers SVM achieved definitelybest results It achieved highest overall accuracy ACCALL =

09296 whereas for the second-best classifier the logisticregression ACCALL = 08028 The SVM outperformed thelogistic regression in classifying all four lesion types It isto be stressed that the SVM turned out to be the mosteffective classifier in recognizing malignant melanoma themost dangerous of all four lesions Its true positive rate andarea under the curve amounted to TPRMM = 08611 andAUCMM = 09688 respectively A little bit lower true negativerate for the SVM than for the decision tree is not a problemin the studied area It indicates misclassification of Clarkor Spitz nevi as malignant melanoma and the cost of sucha misdiagnosis is much much lower than of classifying amalignant tumor as a benign one

The achieved results are slightly better thanmost results ofmalignant-or-benign studies conducted so far Our methodallowed classifying malignant lesions with TPRMM = 08611

and TNRMM = 09623 whereas for other similar studies TPRfalls within the range 08ndash10 andTNRndash05ndash095 respectively[66 71]

A truly reliable diagnosis can be made only by ahistopathological examination of a lesion When in theirstudies Curley et al asked three experienced dermatologiststo classify lesions based only on the visual examination thosephysicians identified correctly only half of cases Therefore

14 BioMed Research International

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(a) Logit (AUC = 09469)

0

02

04

06

08

1

TPR

02 04 06 08 101 minus TNR

(b) Tree (AUC = 09555)

0

02

04

06

08

1

TPR

104 06 080201 minus TNR

(c) 3NN (AUC = 09152)

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(d) 5NN (AUC = 08746)

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(e) 10NN (AUC = 07860)

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(f) 15NN (AUC = 07378)

02 04 06 080 11 minus TNR

0

02

04

06

08

1

TPR

(g) SVM (AUC = 09688)

Figure 12 ROC plots for individual classifiers for a malignant-or-benign classification

BioMed Research International 15

Table 2 The assessment of classifiers for a 1-vs-all and overall classification

Measure Logit Tree 3NN 5NN 10NN 15NN SVMTPRBN 09730 08485 07200 07660 06939 06735 10000TNRBN 10000 09266 10000 10000 09785 09677 10000AUCBN 10000 09675 09914 09889 09679 09659 10000TPRCN 08611 06977 08519 07586 08148 08182 09143TNRCN 09528 09394 08870 08761 08783 08500 09626AUCCN 09817 09279 09615 09089 09099 09030 09851TPRMM 08387 09655 07419 06786 06071 05185 08611TNRMM 09189 09381 08919 08596 08421 08174 09623AUCMM 09469 09555 09152 08746 07860 07378 09688TPRSN 08421 07568 08235 07105 06053 05227 09429TNRSN 09712 09333 09352 09231 08846 08776 09813AUCSN 09648 09425 09606 09395 08812 08545 09789ACCALL 08803 08028 07746 07324 06761 06197 09296BN blue nevus CN Clark nevus MM malignant melanoma SN Spitz nevus ALL multiclass classification

the presented method of classification yields better resultsthan those obtained during a medical examination

4 Conclusions

This paper presents a computer-aided approach for theautomatic classification of melanocytic lesions and accuratediagnosis of melanoma In our research we evaluated ourapproach with four types of classifiers 119896-nearest neighborsalgorithm logistic regression decision tree and SVM Thedeveloped system achieved 92 accuracy with SVM

Our study gives an important contribution to the researcharea of skin lesion classification for several reasons Firstly itfocuses on specifying the exact type of a skin lesion and notonly on categorization of skin lesions as benign or malignantSecondly this work combines the results of research doneso far related to all the steps needed for the developmentof an automatic diagnostic system for melanocytic lesiondetection and classification Finally the refinement of currentapproaches and development of new techniques andmethodswill help to improve the ability to diagnose skin moles moreprecisely and to achieve the goal of the significant reductionin melanoma mortality rate

41 Future Work Starting from the present frameworkfurther research efforts will be firstly addressed to compareand integrate the very promising approaches and corre-sponding feature descriptors reported in the most recentliterature in order to improve the classification accuracy ofmelanocytic lesions We will also conduct a follow-up studyby collecting more real data especially melanoma cases tofurther evaluate our approach One of the major problems inthe field of melanocytic skin tumors is the underdiagnosisof melanoma as a benign melanocytic or nonmelanocyticlesion To decrease the amount of dermoscopic pitfalls thenumber of melanocytic lesion types will be extended for abetter evaluation of melanocytic lesion

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This scientific work was partly supported by the National Sci-ence Centre based on the Decision no 201101NST706783

References

[1] Skin September 2015 httpsciencenationalgeographiccomsciencehealth-and-human-bodyhuman-bodyskin-article

[2] G Argenziano and H P Soyer Interactive Atlas of DermoscopyBook and CDWeb Resource Edra Medical Publishing and NewMedia Milan Itlay 2000

[3] K Korotkov and R Garcia ldquoComputerized analysis of pig-mented skin lesions a reviewrdquoArtificial Intelligence inMedicinevol 56 no 2 pp 69ndash90 2012

[4] P Soyer G Argenziano R Hofmann-Wellenhof and RH JohrColor Atlas of Melanocytic Lesions of the Skin Springer NewYork NY USA 2007

[5] Melanoma Foundation of New Zealand About melanomakey information 2014 httpwwwmelanomanetworkconzAbout-MelanomaKey-Information

[6] E Stocketh T Rosen and S Shumack Managing Skin CancerSpringer 2010

[7] S W Menzies L Bischof H Talbot et al ldquoThe performanceof SolarScan an automated dermoscopy image analysis instru-ment for the diagnosis of primary melanomardquo Archives ofDermatology vol 141 no 11 pp 1388ndash1396 2005

[8] M Burroni R CoronaGDellrsquoEva et al ldquoMelanoma computer-aided diagnosis reliability and feasibility studyrdquoClinical CancerResearch vol 10 no 6 pp 1881ndash1886 2004

[9] A Masood and A A Al-Jumaily ldquoComputer aided diagnosticsupport system for skin cancer a review of techniques andalgorithmsrdquo International Journal of Biomedical Imaging vol2013 Article ID 323268 22 pages 2013

16 BioMed Research International

[10] M E Celebi W V Stoecker and R H Moss ldquoAdvances inskin cancer image analysisrdquo ComputerizedMedical Imaging andGraphics vol 35 no 2 pp 83ndash84 2011

[11] H Zhou M Chen R Gass et al ldquoFeature-preserving artifactremoval from dermoscopy imagesrdquo in Medical Imaging ImageProcessing vol 6914 of Proceedings of SPIE pp 1ndash9 San DiegoCalif USA February 2008

[12] P Wighton T K Lee and M S Atkins ldquoDermascopic hairdisocclusion using inpaintingrdquo in Medical Imaging Image Pro-cessing vol 6914 ofProceedings of SPIE pp 1ndash8 SanDiego CalifUSA February 2008

[13] Q Abbas I F Garcia M Emre Celebi and W Ahmad ldquoAfeature-preserving hair removal algorithm for dermoscopyimagesrdquo Skin Research and Technology vol 19 no 1 pp e27ndashe36 2013

[14] J Jaworek-Korjakowska ldquoNovel method for border irregularityassessment in dermoscopic color imagesrdquo Computational andMathematicalMethods inMedicine vol 2015 Article ID 49620211 pages 2015

[15] J Jaworek-Korjakowska and R Tadeusiewicz ldquoHair removalfrom dermoscopic color imagesrdquo Bio-Algorithms and Med-Systems vol 9 no 2 pp 53ndash58 2013

[16] M E Celebi H Iyatomi G Schaefer and W V StoeckerldquoLesion border detection in dermoscopy imagesrdquoComputerizedMedical Imaging and Graphics vol 33 no 2 pp 148ndash153 2009

[17] J Jaworek-Korjakowska Analiza i detekcja struktur lokalnych wczerniaku złosliwym [Autoreferat Rozprawy Doktorskiej] AGHUniversity of Science and Technology Krakow Poland 2013

[18] E Zagrouba and W Barhoumi ldquoA prelimary approach for theautomated recognition ofmalignantmelanomardquo ImageAnalysisamp Stereology vol 23 no 2 pp 121ndash135 2004

[19] D Ballardand and C Brown Computer Vision Prentice HallProfessional Technical Reference 1st edition 1982

[20] R M Haralick ldquoA measure for circularity of digital figuresrdquoIEEE Transactions on Systems Man and Cybernetics vol 4 no4 pp 394ndash396 1974

[21] R S Montero and E Bribiesca ldquoState of the art of compactnessand circularity measuresrdquo International Mathematical Forumvol 4 no 27 pp 1305ndash1335 2009

[22] J Smietanski R Tadeusiewicz and E Łuczynska ldquoTextureanalysis in perfusion images of prostate cancermdasha case studyrdquoInternational Journal of Applied Mathematics and ComputerScience vol 20 no 1 pp 149ndash156 2010

[23] J W Choi Y W Park S Y Byun and S W Youn ldquoDifferentia-tion of benign pigmented skin lesions with the aid of computerimage analysis a novel approachrdquo Annals of Dermatology vol25 no 3 pp 340ndash347 2013

[24] M E Celebi H A Kingravi B Uddin et al ldquoA methodologicalapproach to the classification of dermoscopy imagesrdquo Comput-erized Medical Imaging and Graphics vol 31 no 6 pp 362ndash3732007

[25] C Manning P Raghavan and H Schutze Introduction toInformation Retrieval Cambridge University Press 2008

[26] J Shawe-Taylor and N Cristianini Kernel Methods for PatternAnalysis Cambridge University Press Cambridge UK 2004

[27] L Kaufman and P Rousseeuw Finding Groups in Data AnIntroduction to Cluster Analysis Wiley Hoboken NJ USA1990

[28] R M Haralick K Shanmugam and I Dinstein ldquoTexturalfeatures for image classificationrdquo IEEE Transactions on SystemsMan and Cybernetics vol 3 no 6 pp 610ndash621 1973

[29] MM Galloway ldquoTexture analysis using gray level run lengthsrdquoComputer Graphics and Image Processing vol 4 no 2 pp 172ndash179 1975

[30] A Chu C M Sehgal and J F Greenleaf ldquoUse of gray valuedistribution of run lengths for texture analysisrdquo Pattern Recog-nition Letters vol 11 no 6 pp 415ndash419 1990

[31] B V Dasarathy and E B Holder ldquoImage characterizationsbased on joint gray level-run length distributionsrdquo PatternRecognition Letters vol 12 no 8 pp 497ndash502 1991

[32] H Ganster A Pinz R Rohrer E Wildling M Binder and HKittler ldquoAutomated melanoma recognitionrdquo IEEE Transactionson Medical Imaging vol 20 no 3 pp 233ndash239 2001

[33] M DrsquoAmico M Ferri and I Stanganelli ldquoQualitative asym-metry measure for melanoma detectionrdquo in Proceedings of theIEEE International Symposium on Biomedical Imaging Nano toMacro vol 2 pp 1155ndash1158 Arlington Va USA April 2004

[34] W V Stoecker W W Li and R H Moss ldquoAutomatic detectionof asymmetry in skin tumorsrdquo Computerized Medical Imagingand Graphics vol 16 no 3 pp 191ndash197 1992

[35] U Fayyad and K Irani ldquoMulti-interval discretization ofcontinuous-valued attributes for classification learningrdquo inProceedings of the 13th International Joint Conference on ArticialIntelligence vol 2 pp 1022ndash1029 Morgan Kaufmann 1993

[36] S Aksoy and R M Haralick ldquoFeature normalization andlikelihood-based similarity measures for image retrievalrdquo Pat-tern Recognition Letters vol 22 no 5 pp 563ndash582 2001

[37] V Ng and D Cheung ldquoMeasuring asymmetries of skin lesionsrdquoin Proceedings of IEEE International Conference on SystemsMan and Cybernetics vol 5 pp 4211ndash4216 IEEE PressOrlando Fla USA October 1997

[38] Z She Y Liu and A Damatoa ldquoCombination of features fromskin pattern and ABCD analysis for lesion classificationrdquo SkinResearch and Technology vol 13 no 1 pp 25ndash33 2007

[39] B Kusumoputro and A Ariyanto ldquoNeural network diagnosisof malignant skin cancers using principal component analysisas a preprocessorrdquo in Proceedings of the IEEE International JointConference on Neural Networks and the IEEE World Congresson Computational Intelligence vol 1 pp 310ndash315 IEEE PressAnchorage Alaska USA May 1998

[40] E Claridge P N Hall M Keefe and J P Allen ldquoShape analysisfor classification ofmalignantmelanomardquo Journal of BiomedicalEngineering vol 14 no 3 pp 229ndash234 1992

[41] Y Cheng R Swamisai S E Umbaugh et al ldquoSkin lesionclassification using relative color featuresrdquo Skin Research andTechnology vol 14 no 1 pp 53ndash64 2008

[42] K Cheung Image processing for skin cancer detection malignantmelanoma recognition [PhD thesis] University of Toronto 1997

[43] T K Lee D I McLean and M S Atkins ldquoIrregularity indexa new border irregularity measure for cutaneous melanocyticlesionsrdquoMedical Image Analysis vol 7 no 1 pp 47ndash64 2003

[44] T K Lee and E Claridge ldquoPredictive power of irregular bordershapes for malignant melanomasrdquo Skin Research and Technol-ogy vol 11 no 1 pp 1ndash8 2005

[45] Y Chang R J Stanley R H Moss andW Van Stoecker ldquoA sys-tematic heuristic approach for feature selection for melanomadiscrimination using clinical imagesrdquo Skin Research and Tech-nology vol 11 no 3 pp 165ndash178 2005

[46] A J Round AW G Duller and P J Fish ldquoLesion classificationusing skin patterningrdquo Skin Research and Technology vol 6 no4 pp 183ndash192 2000

BioMed Research International 17

[47] Z She and P Fish ldquoAnalysis of skin line pattern for lesionclassificationrdquo Skin Research and Technology vol 9 no 1 pp73ndash80 2003

[48] R P Walvick K Patel S V Patwardhan and A P DhawanldquoClassification of melanoma using wavelet-transform-basedoptimal feature setrdquo in Proceedings of the Medical ImagingImage Processing vol 5370 of Proceedings of SPIE pp 944ndash951May 2004

[49] W V Stoecker C-S Chiang and R H Moss ldquoTexture in skinimages comparison of three methods to determine smooth-nessrdquo Computerized Medical Imaging and Graphics vol 16 no3 pp 179ndash190 1992

[50] S V Deshabhoina S E Umbaugh W V Stoecker R H Mossand S K Srinivasan ldquoMelanoma and seborrheic keratosisdifferentiation using texture featuresrdquo Skin Research and Tech-nology vol 9 no 4 pp 348ndash356 2003

[51] G M Weiss ldquoMining with rarity a unifying frameworkrdquo ACMSIGKDD Explorations Newsletter vol 6 no 1 pp 7ndash19 2004

[52] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

[53] H Liu and L Yu ldquoToward integrating feature selection algo-rithms for classification and clusteringrdquo IEEE Transactions onKnowledge and Data Engineering vol 17 no 4 pp 491ndash5022005

[54] I T Jolliffe Principal Component Analysis Springer New YorkNY USA 1986

[55] I Kononenko and E Simec ldquoInduction of decision treesusing RELIEFFrdquo in Proceedings of the ISSEK94 Workshop onMathematical and Statistical Methods in Artificial Intelligencevol 363 of International Centre forMechanical Sciences pp 199ndash220 Springer Vienna Austria 1995

[56] R Battiti ldquoUsing mutual information for selecting features insupervised neural net learningrdquo IEEE Transactions on NeuralNetworks vol 5 no 4 pp 537ndash550 1994

[57] M A Hall ldquoCorrelation-based feature selection for discreteand numeric class machine learningrdquo in Proceedings of the 17thInternational Conference on Machine Learning (ICML rsquo00) pp359ndash366 Morgan Kaufmann 2000

[58] E E Ghiselli Theory of Psychological Measurement McGraw-Hill New York NY USA 1964

[59] I H Witten and E Frank Data Mining Practical MachineLearning Tools and Techniques Morgan Kaufmann San Fran-cisco Calif USA 2005

[60] S Dreiseitl L Ohno-Machado H Kittler S Vinterbo HBillhardt and M Binder ldquoA comparison of machine learningmethods for the diagnosis of pigmented skin lesionsrdquo Journal ofBiomedical Informatics vol 34 no 1 pp 28ndash36 2001

[61] A Glowacz A Glowacz and P Korohoda ldquoRecognition ofmonochrome thermal images of synchronous motor with theapplication of binarization and nearestmean classifierrdquoArchivesof Metallurgy and Materials vol 59 no 1 pp 31ndash34 2014

[62] R Tadeusiewicz How Intelligent Should be System for ImageAnalysisvol 339 of Studies in Computational IntelligenceSpringer Berlin Germany 2011

[63] L Jost ldquoEntropy and diversityrdquo Oikos vol 113 no 2 pp 363ndash375 2006

[64] K-B Duan and S S Keerthi ldquoWhich is the bestmulticlass SVMmethod An empirical studyrdquo inMultiple Classifier Systems 6thInternational Workshop MCS 2005 Seaside CA USA June 13ndash15 2005 Proceedings vol 3541 of Lecture Notes in ComputerScience pp 278ndash285 Springer Berlin Germany 2005

[65] C Hsu and C Lin ldquoA comparison of methods for multi-class support vector machinesrdquo IEEE Transactions on NeuralNetworks vol 13 no 2 pp 415ndash425 2002

[66] A A Safi Towards computer-aided diagnosis of pigmented skinlesionson [PhD thesis] Technische Universitat Munchen 2012

[67] C-W Hsu C-C Chang and C-J Lin ldquoA practical guide tosupport vector classificationrdquo Tech Rep National Taiwan Uni-versity 2003 httpwwwcsientuedutwsimcjlinpapersguideguidepdf

[68] A M Molinaro R Simon and R M Pfeiffer ldquoPrediction errorestimation a comparison of resamplingmethodsrdquo Bioinformat-ics vol 21 no 15 pp 3301ndash3307 2005

[69] R Bouckaert ldquoChoosing between two learning algorithmsbased on calibrated testsrdquo in Proceedings of the 20th Interna-tional Conference on Machine Learning (ICML rsquo03) T Fawcettand N Mishra Eds pp 51ndash58 AAAI Press Washington DCUSA 2003

[70] P Flach Machine Learning The Art and Science of AlgorithmsThat Make Sense of Data Cambridge University Press NewYork NY USA 2012

[71] P Rubegni G Cevenini M Burroni et al ldquoAutomated diagno-sis of pigmented skin lesionsrdquo International Journal of Cancervol 101 no 6 pp 576ndash580 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 5: Research Article Automatic Classification of Specific ...downloads.hindawi.com/journals/bmri/2016/8934242.pdfmost popular melanocytic nevi to the diagnostic categories including Clark

BioMed Research International 5

Figure 6 Examples of malignant melanoma based on [2]

Single lesionanalysis

Preprocessing Segmentation

(i) Black frame removal(ii) Hair detection

(iii) Hair inpaintingFeature

extraction(i) Geometrical feature

(ii) Color-based feature(iii) Texture description(iv) Asymmetry

Feature selectionclassification

(i) Black frame removal(ii) Hair detection

(iii) Hair inpaintingFeature

extraction

Figure 7 The proposed algorithm for the classification of melanocytic lesions based on dermoscopic color images

Table 1 A categorization of feature descriptors commonly used inthe computerized analysis of dermoscopic images

Clinicalfeatures Feature descriptors References

Asymmetry Symmetry distance [37]Lesionrsquos centroid [38]

Borderirregularity

Fourier feature [39]Fractal geometry [40]Area and perimeter [38 41 42]Irregularity index [43 44]

Colorvariegation RGB statistical descriptors [38 45]

Diameter Semimajor axis of the ellipse [38]

Otherfeatures

Pattern analysis [38 46 47]Wavelet-based descriptors [48]Texture descriptors [49]Intensity distribution descriptors [5]Haralick descriptors [50]

different diagnostic algorithms to the best of our knowledgea CAD system to classify differentmelanocytic lesions has notyet been proposed Based on the review article [3] in Table 1we present an extended categorization of feature descriptorswhich are commonly used in the computerized analysis ofdermoscopic images

2 Materials and Methods

The designed system for the automated diagnosis of melan-ocytic lesions is a computer-aided diagnosis system which isdesigned to reproduce the decision of dermatologist based onthe dermoscopy images The proposed methodology of dis-crimination between malignant melanoma and nevi tumorsis shown in Figure 7The automated system is divided into sixmain stages including preprocessing (image enhancement)segmentation feature extraction feature selection classifica-tion and then evaluationThe system enables texture analysiswithout being limited by selection and detection of structureof interest

In this section the preprocessing and segmentation stepsare described shortly based on our previous work whilethe main stage which is feature extraction selection andclassification is presented in detail The application has beenimplemented using Matlab ver 2013

21 Dermoscopic Image Preprocessing The main goal ofthe preprocessing step is to improve the image quality byreducing or even removing the unrelated and surplus parts inthe dermoscopic imagesDermoscopic images are inhomoge-neous and complex For dermoscopy images the preprocess-ing step is obligatory because of extraneous artifacts such asskin lines air bubbles and hairs which appear in virtuallyevery image

To enhance the image and to reduce the influence oflight hairs air bubble small pores shines and reflections amedian filter is being used

6 BioMed Research International

(a) (b)

(c) (d)

(e) (f)

Figure 8 Outcome of the preprocessing step (a) input image (b) black frame removal (c) grayscale conversion (d) top-hat transform andbinarization process (e) hair distinction from other structures and (f) inpainting

Among the most necessary artifact rejection steps is hairremoval because hairs may cover parts of the image andmake the segmentation and texture analysis impossible Anumber of methods have been developed for hair removalin dermoscopic images and they were mostly based onmorphological operations and adaptive thresholding [11ndash13]A good approach for hair removal is the use of top-hattransformationThe process consists of four steps convertingRGB to grayscale image applying black top-hat transforma-tion distinguishing hairs from other local structures andinpainting

The dermoscopic RGB image is being converted intograyscale with the NTSC 1953 standard (Figure 8(c)) Sec-ondly the black top-hat transform which is a morphologicalimage processing technique is used to detect thick darkhairs The result of this step is the difference between theclosing operation and the input image

119879119908(119868) = 119868 ∘ 119887 minus 119868 (1)

where ∘ denotes the closing operation 119868 is the grayscale inputimage and 119887 is a grayscale structuring element [14 15]

The black top-hat transform returns an image containingelements that are darker than their surrounding and smallerthan the structuring element

These steps have been precisely described in our work [1415] Figure 8 presents the outcome after each step

22 Segmentation Algorithm The techniques and algorithmsavailable for segmentation of medical structures are specificto application imaging modality and type of body part to bestudied For the dermoscopic images segmentation processis one of the most challenging and crucial processes Thisprocess for dermoscopic images is extremely difficult dueto several factors low contrast between the healthy skinand moles variegate coloring inside lesions and irregularborders as well as different artifacts Due to the difficultiesdescribed above numerousmethods have been implementedand tested [3] Celebi et al present in their research [16]

BioMed Research International 7

Health skin segmentation

(a) (b) (c)

(d) (e)

Figure 9 Results of the segmentation process (andashd) segmentation of the healthy skin with the region-growing algorithm (e) segmentedarea

the state of the art of segmentation methods and comparethem with the statistical region merging as a recent colorimage segmentation technique based on region growingand merging Based on the achieved results and conductedexperiments the skin lesion extraction is performed byseeded region-growing algorithm [17] in regard to twoaspects Firstly during the preprocessing step the healthyskin becomes homogeneous Secondly the whole skinmole isvisible in the dermoscopic image and surrounded by healthyskin It means that the healthy skin surrounds the moleRegion-growing techniques generally give better results innoisy images where edges are extremely difficult to detect

For the skin lesion segmentation we take one seed whichis located in the left upper corner of the image (Figure 9(a))The region is iteratively grown by comparing all unallocatedneighboring pixels to the regionThe region-growing processconsists of picking a seed from the set investigating all4-connected neighbors of this seed and merging suitableneighbors to the seed The seed is then removed and allmerged neighbors are added to the seed set The region-growing process continues until the seed set is empty

These steps have been precisely described in our work[14] In Figure 9 we present the results of the segmentationstep for several iterations of the region-growing algorithm

23 Geometrical Feature Extraction Geometrical featureshave been used mainly to describe lesionrsquos outline as itsirregularity usually indicates malignancy Those features arebased mostly on such properties of an object as areadiameters or geometric moments To ensure robustness ofthe selected features their formulation does not dependdirectly on objectrsquos perimeter (as in case of raster images itis hard to accurately estimate this quantity) All features arenormalized to ensure their scale- rotation- and translation-invariance

To assess lesionrsquos shape the following features havebeen extracted maximal diameter (maximum distancebetween two arbitrary points) equivalent diameter (119863eq =

radic(4120587) sdot Area) variance of the radial distance distributionrectangularity (ratio of area of an object to area of itsbounding box) elongation (aspect ratio of objectrsquos boundingbox) eccentricity Haralickrsquos compactness and normalizeddiscrete compactness [18ndash21]

Variance of the radial distance distribution is given by [18]

119904119889

2

=1

card (119861)sum119901isin119861

(119889 (119901 119862) minus 119889 (119901 119862))2

119889 (119901 119862)2

(2)

where 119861 is a set of all pixels constituting perimeter of object119874 119862 is geometric centroid of 119874 and 119889(119886 119887) is the distancebetween pixels 119886 and 119887 in a chosen metric As studied lesionsare of circular shape the Euclidean metric has been chosen

Eccentricity measures deviation of a conic curve from acircle and (in mathematics) is formulated as ratio of distancebetween foci of an ellipse to the length of its major axisIn image processing eccentricity of an object 119874 is actuallyvalue of eccentricity of an ellipse with same second momentsas object 119874 [19]

120576 =(11989802minus 11989820)2

+ 411989811

(11989802+ 11989820)2

(3)

where 119898119901119902

denotes an image moment of (119901 119902) order How-ever such a formulation of eccentricity still serves the pur-pose of measuring circularity of an object

Haralickrsquos compactness is a measure for circularity of adigital figure defined as 120583

119877120590119877 where 119877 is a random variable

of the distance between the center of the figure to any part ofits perimeter [20]

8 BioMed Research International

Normalized discrete compactness 119862119863119873

measure is basedon counting the number of cell sides common to adjacentpixels of objectrsquos perimeter a measure called discrete com-pactness 119862

119863[21]

119862119863119873

=119862119863minus 119862119863min

119862119863max minus 119862

119863

=119875 minus 2119899 + 2

119875 + 4radic119899

119862119863=4119899 minus 119875

2 119862119863min = 119899 minus 1 119862

119863max =4119899 minus 4radic119899

2

(4)

where119862119863min119862119863max are respectively lower and upper bound

of discrete compactness of a shape composed of 119899 pixels and119875 is the perimeter of the digital region [21]

24 Color-Based Features The analysis of lesionrsquos colors is animportant source of information when determining lesionrsquostype as malignant lesions are characterized by a rich texture[22]

The following color-based features have been extractednumber of colors present within lesionrsquos area (together withinformation about the presence of two specific colors whiteand black) concentricity centroid distance and 119871

lowast

119886lowast

119887lowast

histogram distances [23 24]To determine the number of colors and to calculate the

concentricity an image has been converted to CIE 119871lowast

119886lowast

119887lowast

color space and then 119886lowast and 119887

lowast color channels have beenclustered into four clusters Three clustering algorithms havebeen tested 119896-means clustering kernel 119896-means clusteringand hierarchical agglomerative clustering [25 26] Kernel119896-means algorithm has been tested using Gaussian kernelfor 120590 isin 1 2 [26] In case of hierarchical agglomerativeclustering Wardrsquos minimum variance method based onEuclidean distance between clusters has been applied as thecriterion for choosing the pair of clusters to merge at eachstep [27]

The number of colors has been related to the greatestdistance between clustersrsquo centroids (each pair of clusters hadbeen considered) and it has been assumed that to identify twocolors as significantly different the distance between them in119886lowast

119887lowast space must exceed certain threshold value

119899colors = max(1 lfloor1120591max (119863)rfloor)

119863 = 119889119864(1198881 1198882) 1198881 1198882isin 119862 and 119888

1= 1198882

(5)

where 119862 is a set of clustersrsquo centroids The threshold value 120591has been set to 120591 = 12

It has been assumed that to state the presence of a certaincolor within the lesion the area of biggest cohesive area ofthat color must be greater than 1 of the area of the wholelesion Pixels have been recognized as white if 119871lowast gt 65 and asblack if 119871lowast lt 15 where 119871lowast is the value of luminosity channelfor the given pixel

Based on the aforementioned clusterization a measurecalled ldquoconcentricityrdquo is derived [23] To compute concen-tricity of an object its color clusters are first arranged inascending order regarding the area of their convex hull(ie segment 119878

1has the smallest area of convex hull and

segment 1198784the greatest) and then a few auxiliary indexes are

calculated The 119899(sdot) function stands for the number of pixelsof the given object

For segment 1198781 the percent area index PA is computed

[23]

PA =119899 (119862119862max)

119899 (1198781)

(6)

where 119862119862max is the largest cohesive set of pixels of 1198781 The PAdescribes how well the core is grouped

Let Hull denote the minimal convex area which includesthe entire pixels of 119878

2 The core inclusion CI is then defined

as [23]

CI =119899 (1198781capHull)

119899 (1198781)

(7)

The CI describes how well the second smallest area encirclesthe core

Let Core denote the minimal convex area which includesthe entire pixels of 119878

1 The hull exclusion HE is given by [23]

HE = 1 minus119899 (1198782cap Core)

119899 (1198782)

(8)

TheHEdescribes how exclusively the hull surrounds the coreFinally concentricity is defined as the product of the three

variables [23]

Concentricity = PA times CI timesHE

Concentricity isin [0 1]

(9)

Concentricity closer to one implies a better concentric struc-ture

Images of all regions used to compute concentricity havebeen compiled in Figure 10

The centroid distance for a color channel is defined asthe distance between the geometric centroid (of the binaryobject) and the brightness centroid of that channel Thebrightness centroid may be considered a center of mass ofan object whose density is determined by intensity valuesof its pixels If the pigmentation in a particular channel ishomogeneous the brightness centroid will be close to thegeometric centroid resulting in small value of the centroidaldistance

Color similarity of two regions has been quantified by the1198711- and 119871

2-norm histogram distances for CIE 119871

lowast

119886lowast

119887lowast color

space coarsely quantized into 4 times 8 times 8 bins [24]

1198711(119867119860 119867119861) =

4times8times8

sum

119894=1

1003816100381610038161003816119867119860 (119894) minus 119867119861(119894)1003816100381610038161003816

1198712(119867119860 119867119861) =

4times8times8

sum

119894=1

(119867119860(119894) minus 119867

119861(119894))2

(10)

where119867119860119867119861are histograms of areas 119860 and 119861 respectively

BioMed Research International 9

Object S1 S2 CCmax

Core S1 cap core Hull S2 cap hull

Figure 10 Regions used to compute concentricity of an object

25 Texture Description Quantitative properties of lesionrsquostexture were described with measures based on a gray levelcooccurrence matrix (GLCM) and a gray level run-lengthmatrix (GLRLM) [28 29]

GLCM is a square matrix 119875 where the (119894 119895)th entry of 119875represents information about the frequency of occurrence ofsuch two adjacent pixels where one of them has intensity 119894

and another has intensity 119895 Such informationmay be used toderive second-order statistical measurements characterizingexamined texture GLCM and six measures derived from it(contrast correlation energy homogeneity maximum prob-ability and dissimilarity) have been calculated as describedby Celebi et al [24]

Contrast = sum

119894119895

(119894 minus 119895)2

119875119894119895

Correlation = minussum

119894119895

(119894 minus 120583119909) (119895 minus 120583

119910)

120590119909120590119910

119875119894119895

Energy = sum

119894

sum

119895

[119875119894119895]2

Homogeneity = sum

119894

sum

119895

119875119894119895

1 + (119894 minus 119895)2

Maximum Probability = max119894119895

119875119894119895

Dissimilarity = sum

119894

sum

119895

119875119894119895sdot1003816100381610038161003816119894 minus 119895

1003816100381610038161003816

(11)

With GLRLM it is possible to analyze higher orderstatistical features for the given texture GLRLM is a two-dimensional matrix in which each element 119901(119894 119895 | 120579) gives

the total number of occurrences of runs of length 119895 at graylevel 119894 in a given direction 120579

In our study the direction of runs is irrelevant asdermoscopic camera has no fixed orientation when takingimages of lesionsmdashthere is no ldquoreference orientationrdquo Theonly thing that matters is texturersquos homogeneity thus insteadof calculating measures for individual orientations 120579 isin

0∘

45∘

90∘

135∘

all GLRLM computed for the aforemen-tioned orientations have been added together and mea-sures have been computed only using this new orientation-invariant matrix

GLRLM is used to derive 11measures Five basicmeasures(SRE LRE GLN RLN and RP) describe the distribution ofrunsrsquo lengths (SRE LRE RLN and RP) and runsrsquo intensity(GLN) [29] As SRE and LRE do not consider pixelsrsquo intensityLGRE andHGREmeasures have been proposed [30] FinallySRLGE SRHGE LRLGE and LRHGEmeasures are based onstatistical properties of a joint probability distribution of bothrunsrsquo intensity and length [31]

SRE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

119894119895

1198952

LRE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198952

119894119895

GLN =1

119899119903

119866

sum

119894=1

(

119877

sum

119895=1

119894119895)

2

RLN =1

119899119903

119877

sum

119895=1

(

119866

sum

119894=1

119894119895)

2

10 BioMed Research International

RP =119899119903

119899

LGRE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

119894119895

1198942

HGRE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198942

119894119895

SRLGE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

119894119895

1198942 sdot 1198952

SRHGE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198942

119894119895

1198952

LRLGE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198952

119894119895

1198942

LRHGE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198942

1198952

119894119895

(12)

where 119899 is the number of image pixels with 119866 gray levels 119877 isthe maximal run length and 119899

119903is the total number of runs in

an image (119899119903= sum119866

119894=1sum119877

119895=1119894119895)

26 Asymmetry The observed asymmetry is a significantdiagnostic premise as in malignant lesions the arrangementof local structures (eg dots streaks and pigmentation nets)is nonuniform across the whole area of a lesion [32] Someresearchers point out that for instance sharp transitionsbetween central and border area indicatemalignancy [24 33]

Asymmetry measures may be defined in terms of ratiosof feature values across various lesionrsquos area segments or interms of changes of feature values between halves obtainedby splitting lesion area into halves using straight section linespassing through the center of mass [32 33]

In our study a different approach has been adopted Thelesion area has been first divided into sets of subregions inthree differentmanners (Figure 11) by splitting (1) into centraland border part (2) into halves along minor and major axisand (3) into quarters using same axes Then a variance offeature values has been calculated for each set of subregionsAdditionally asymmetry indices proposed by Stoecker et al[34] and the following quantities have been computed

119877119876119894

=119891119876119894

119891119882

(13)

where 119891119876119894

denotes value of feature 119891 calculated for 119894thquadrant and 119891

119882for the whole area respectively

27 Data Preparation In order to apply correlated-basefeature selection method (the choice of this particularmethod is discussed later) numerical features have beendiscretizedThis is due to the fact that in the studied problem

the decision feature is a categorical variable which meansit takes on discrete values and the relationship betweenvariables of those two classes cannot be directly assessedTheonly solution is to discretize the variable taking continuousvalues In general terms discretization is a simple logicalcondition which takes into account one or a few attributesand which aims at splitting a range of data into at least twosubsets Data in our study have been discretized accordingto the method proposed by Fayyad and Irani [35] whichuses heuristics minimizing entropy based on the ldquominimumdescription length principlerdquo

To ensure correct work of algorithms based on analysis ofdistances between points in the feature space (eg 119896-nearestneighbors algorithm SVM) for each feature its values havebeen scaled to the range of [minus1 1] Consequently the riskof a situation in which a feature with larger range of valuesdominates other features has been eliminated Normallydistributed features have been transformed using 119885-score(Equation (14)) whereas other features were rescaled linearlyto the desired range (15) [36] Consider

x =x minus x3119904

where x =1

119873sum

119894

x119894 119904 =

1

119873 minus 1(sum

119894

(x119894minus 120583)2

)

(14)

x = 2x119894minus xmin

xmax minus xminminus 1 (15)

where x is a vector of featurersquos values prior to normalizationand x is the same vector after normalization To determineif a given feature is well modeled by a normal distributionchi-square goodness-of-fit test has been applied

28 Class Imbalance Problem The data set used in thisstudy exhibits class imbalance problem which means thatthe classes are not approximately equally represented (ie itconsists of more than three times as many images of Clarknevus than images of blue nevus) In such case most clas-sifiers will focus on learning how to identify representativesof majority classes leading to poor predictive accuracy forminority classes When making a diagnosis such a situationis unacceptable as misclassification error may lead even topatientrsquos death

The most common way to address this issue is sampling[51] There exist two main types of sampling methodsundersampling which consists in removing representativesof the majority class and oversampling which consists inadding examples of the minority class Out of many samplingtechniques two methods are particularly popular randomundersampling and synthetic minority oversampling technique(SMOTE) In random undersampling method randomlydrawn representatives of the majority class are removed fromthe data set In SMOTE method new ldquosyntheticrdquo examplesare created by averaging a few representatives of the sameminority classes which are nearest neighbors in the featurespace [52]

BioMed Research International 11

(a) Whole area

(b) Inner and outer area

(c) Split along the major axis (d) Split along the minor axis

(e) Split along both major and minor axis simultaneously

Figure 11 A sample division of an object into subregions

In this study SMOTE method has been used as it hasbeen proven to be more effective than random undersam-pling when tested on a set of dermoscopic images obtainedfrom various sources without any common or rare lesiontypes omitted [24]

29 Feature Selection Feature selection consists in reducingdata dimensionality by rejecting redundant unimportant ornoisy features thus resulting in increased prediction accu-racy less complex classifier models and better computationefficiency

Feature selection algorithms may be divided into threemain categories filters wrappers and projections [53 54] Inthis study filter methods have been used for two reasons[24] Firstly as filters are usually fast it is possible to test

a huge number of combinations of features Secondly if onewould like to use wrappers on a given data set the targetlearning algorithm should demonstrate satisfactory resultsfor the original data set as wrappers are based on feedbackprinciple As some features extracted in this study might beirrelevant or redundant as well as due to class imbalancewrappers have not been likely to fulfill those restrictionsOn the other hand projections are used mainly to increasecomputational performance which was not the case in thisstudy

Out of numerous available filters three areworth noticingdue to their satisfactory performance on various data sets[53] ReliefFmutual information-based feature selection andcorrelation-based feature selection [55ndash57] As numerous fea-tures extracted in this study are strongly mutually correlated

12 BioMed Research International

correlation-based feature selection has been chosen as theonly method which takes into account not only relationshipsbetween features and the decision class but also relationshipsbetween features themselves

The heuristic ldquomeritrdquo Merit119878of a feature subset 119878 contain-

ing 119896 features is given by [58]

Merit119878=

119896119903cf

radic119896 + 119896 (119896 minus 1) 119903ff (16)

where 119903cf is the average feature-class correlation and 119903ff theaverage feature-feature intercorrelation

To compare two feature vectors of discrete values thesymmetric uncertainty an improved version of informationgain measure has been used [59] Symmetric uncertainty isgiven by

SU (119883 119884) = 2 [119867 (119883) + 119867 (119884) minus 119867 (119883 119884)

119867 (119884) + 119867 (119883)] (17)

where 119867(119860) is the marginal entropy of set 119860 and 119867(119860 119861) isthe joint entropy of sets 119860 and 119861

The number of selected features is a parameter whichrequires a careful tuning too few features results may pre-vent classifiers from distinguishing between various classeswhereas too many features impose risk of overfittingmdashasituation when a model excels in classifying training databut fails to generalize knowledge and hence misclassifies newsamples As some researchers suggest limiting the number offeatures 119896 to the range of 5 le 119896 le 30 [24] in this study 119896 = 20

has been adoptedThe applied procedure allowed choosing features which

best discriminate lesion types concentricity computed using119896-means algorithm concentricity computed using kernel 119896-means algorithm for 120590 = 1 119871lowast119886lowast119887lowast histogram distancesin 1198711and 119871

2metrics computed for central and border

part variances from sets of 119871lowast

119886lowast

119887lowast histogram distances

between the whole region and individual quarters in both1198711and 119871

2metrics variances from sets of 119871lowast119886lowast119887lowast histogram

distances between each pair of quarters in both 1198711and 119871

2

metrics variances from sets of GLCM features (dissimilar-ity maximum probability entropy energy correlation andcontrast) computed for quarters variance of variances of119886lowast channel values (in 119871

lowast

119886lowast

119887lowast color space) calculated for

quarters variance of variances of H channel values (in HSVcolor space) calculated for quarters mean value of variancesof 119886lowast channel values (in 119871

lowast

119886lowast

119887lowast color space) for quarters

variance from sets of 119871lowast119886lowast119887lowast histogram distances betweenhalves in both 119871

1and 119871

2metrics and variance from sets of

GLCM dissimilarity computed for halvesThis particular choice of features does not diverge from

the clinical practice according to which asymmetry is one ofthe most important premises in determining the lesion type

210 Classification

2101 Models The following predictive models have beentested [60 61] 119896-nearest neighbors algorithm (for 119896 isin

3 5 10 15) logistic regression decision tree and supportvector machine (SVM) Other models like neural networks

or rule-based systems have not been a subject of this study[62] In case of the decision tree Gini-Simpson index hasbeen used as a measure of data diversity [63] Decisionboundary of SVMs has been determined using radial basisfunction as kernels Radial basis function has been preferredover linear sigmoid and polynomial kernels as it exhibits afew important properties [24] it allows classifying nonlin-early separable data sets (as is the case in this study) it ischaracterized by high numerical stability and it is definedusing only one parameter Moreover as in this study there areas many as four decision classes the multiclass classificationproblem has been decomposed into a (greater) number ofbinary classification problems Winner-takes-all strategy hasbeen applied to combine solutions of subproblems into afinal solution as for small data sets its results are similarto the results of the best strategymdashpairwise couplingmdashwhileat the same time its computational burden is much lower[64 65]

Individual classifiers have been trained using typicalparameters It has been assumed that the cost of misclassify-ing melanoma as nevi is four times higher than another sortof misclassification Initial experimental results have provenSVM to be the most effective classifier (similar observationshave been made by other research teams [60 66]) Con-sequently further experiments were focused on improvingperformance of SVMs by fine-tuning their parameters

SVMwith radial basis function kernel is governed by twoparametersmdashmisclassification cost 119862 and kernel width 120574The task is to select those parameters in such a way that theaccuracy of predictions for new data (data not used in thelearning process) is maximal

As values of only two parameters have had to be adjustedgrid search has been applied [67] For each parametervalues from exponential series have been considered 119862 isin

2minus5

2minus3

215

and 120574 isin 2minus15

2minus13

23

Each com-bination of parameter values (119862

0 1205740) has been assessed

using a validation procedure described below After the gridsearch had finished SVM have been trained using optimalparameter values (119862lowast 120574lowast)

2102 Validation The aim of the validation step is to assessclassifierrsquos quality based on the number of generalizationerrors that is misclassified samples Optimal SVM modelshave been validated using stratified Monte Carlo cross vali-dation method and in each iteration the test set consisted of10 of samples

The Monte Carlo variant of cross validation had beenchosen for a number of reasons Firstly experimental resultsobtained for a similar problem [24] suggest high effectivenessof this method Secondly in their analysis Molinaro et al[68] point out that for datasets consisting of few sampleswith many features such as a dataset used in this studythe difference in effectiveness between Monte Carlo andldquoclassicrdquo 10-fold cross validation is insignificant The numberof samples drawn into the test set has been determined usinga ldquorule of thumbrdquo [69] Finally by applying Monte Carlocross validation one may avoid constructing overdevel-oped predictive models which decreases the risk of overfit-ting

BioMed Research International 13

As in the data set used in this study there is a considerabledifference in the number of representatives between variousdecision classes and stratification has been applied Stratifica-tion ensures the similar distribution of representatives of eachdecision class in both training set and test set in each iterationof the validation procedure

3 Results and Discussion

31 Database Specification The described algorithm for theautomatic classification of specific melanocytic lesions hasbeen tested on dermoscopic images from a widely usedInteractive Atlas of Dermoscopy [2] Images for this atlashave been provided by two university hospitals (Universityof Naples Italy and University of Graz Austria) and storedon a CD-ROM in the JPEG format The documentation ofeach dermoscopic image was performed using a Dermaphotapparatus (Heine Optotechnik Herrsching Germany) anda photo camera (Nikon F3) mounted on a stereomicroscope(Wild M650 Heerbrugg AG Switzerland) in order to pro-duce digitized ELM images of skin lesions All the imageshave been assessed manually by a dermoscopic expert withan extensive clinical experience

Furthermore all the descriptions of skin cases were basedon the histopathological examination of the biopsy materialIn order to develop and test the automatic procedure for theclassification of melanocytic skin lesions 300 images withdifferent resolutions ranging from 0033 to 05mmpixelwere chosen The database included 100 Clark nevus cases70 blue nevus cases 70 Spitz nevus cases and 60 malignantmelanomas

The preprocessing step (black frame removal and hairremoval) as well as the segmentation step (border error lessthan 6) did not affect the further research [14]

32 Statistical Analysis The performance of a classifier canbe assessed based on the analysis of discrepancies in classifi-cation that is differences between the classification carriedout by the classifier and the actual classification (groundtruth) summarized in a confusion matrix In this matrixeach row refers to actual classes 119888(119909) as recorded in the testset and each column refers to classes as predicted by theclassifier (119909) The (119894 119895)th element contains the number oftest instances predicted by a classifier to belong to 119895th classclass whereas they are actually representatives of 119894th class

From a contingency table we can calculate three perfor-mance indicators accuracy true positive rate (TPR) and truenegative rate (TNR)

For a binary classification problem those indicators aregiven by [70]

Accuracy = 1

Tesum

119909isinTe119868 [ (119909) = 119888 (119909)]

TPR =sum119909isinTe 119868 [ (119909) = 119888 (119909) = oplus]

sum119909isinTe 119868 [119888 (119909) = oplus]

TNR =sum119909isinTe 119868 [ (119909) = 119888 (119909) = ⊖]

sum119909isinTe 119868 [119888 (119909) = ⊖]

(18)

where Te is the test set and the function 119868[sdot] denotes theindicator function Positive and negative classes are denotedby oplus and ⊖ respectively

Although the initial problem is a multiclass classificationproblem (with four classes) it can be decomposed into fourbinary classification problems whether an instance belongsto the given class or not

When dealing with a data set exhibiting class imbalanceproblem the accuracy should be considered only a prelim-inary performance indicator For instance for a data setwith class ratio 99 1 a classifier which simply counts eachinstance as a representative of the majority class would scorethe accuracy of 99 Therefore in such cases better measurewould be a plot of the ROC curve and the area under thatcurve

The ROC plot completely visualizes the (normalized)contingency table by means of plot in the unit squarewith TPR rate on the 119910-axis and TNR rate on the 119909-axisEach of the points marked on the ROC plot specifies theclassification performance in terms of true and false positiverates achieved by the corresponding score thresholds In ourstudy posterior probabilities have been used as scores Thosepoints are connected by straight lines producing a piecewiselinear curve the ROC curve that rises monotonically from(0 0) to (1 1) The ROC plot can be interpreted as a plotof costs-to-benefits ratio To measure the performance of aclassifier using a ROC plot the area under that plot curve(AUC) is calculated

Values of aforementioned performance indicators forall examined classifiers have been summarized in Table 2Additionally Figure 12 presents ROC plots for a malignant-or-benign classification

Among all examined classifiers SVM achieved definitelybest results It achieved highest overall accuracy ACCALL =

09296 whereas for the second-best classifier the logisticregression ACCALL = 08028 The SVM outperformed thelogistic regression in classifying all four lesion types It isto be stressed that the SVM turned out to be the mosteffective classifier in recognizing malignant melanoma themost dangerous of all four lesions Its true positive rate andarea under the curve amounted to TPRMM = 08611 andAUCMM = 09688 respectively A little bit lower true negativerate for the SVM than for the decision tree is not a problemin the studied area It indicates misclassification of Clarkor Spitz nevi as malignant melanoma and the cost of sucha misdiagnosis is much much lower than of classifying amalignant tumor as a benign one

The achieved results are slightly better thanmost results ofmalignant-or-benign studies conducted so far Our methodallowed classifying malignant lesions with TPRMM = 08611

and TNRMM = 09623 whereas for other similar studies TPRfalls within the range 08ndash10 andTNRndash05ndash095 respectively[66 71]

A truly reliable diagnosis can be made only by ahistopathological examination of a lesion When in theirstudies Curley et al asked three experienced dermatologiststo classify lesions based only on the visual examination thosephysicians identified correctly only half of cases Therefore

14 BioMed Research International

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(a) Logit (AUC = 09469)

0

02

04

06

08

1

TPR

02 04 06 08 101 minus TNR

(b) Tree (AUC = 09555)

0

02

04

06

08

1

TPR

104 06 080201 minus TNR

(c) 3NN (AUC = 09152)

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(d) 5NN (AUC = 08746)

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(e) 10NN (AUC = 07860)

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(f) 15NN (AUC = 07378)

02 04 06 080 11 minus TNR

0

02

04

06

08

1

TPR

(g) SVM (AUC = 09688)

Figure 12 ROC plots for individual classifiers for a malignant-or-benign classification

BioMed Research International 15

Table 2 The assessment of classifiers for a 1-vs-all and overall classification

Measure Logit Tree 3NN 5NN 10NN 15NN SVMTPRBN 09730 08485 07200 07660 06939 06735 10000TNRBN 10000 09266 10000 10000 09785 09677 10000AUCBN 10000 09675 09914 09889 09679 09659 10000TPRCN 08611 06977 08519 07586 08148 08182 09143TNRCN 09528 09394 08870 08761 08783 08500 09626AUCCN 09817 09279 09615 09089 09099 09030 09851TPRMM 08387 09655 07419 06786 06071 05185 08611TNRMM 09189 09381 08919 08596 08421 08174 09623AUCMM 09469 09555 09152 08746 07860 07378 09688TPRSN 08421 07568 08235 07105 06053 05227 09429TNRSN 09712 09333 09352 09231 08846 08776 09813AUCSN 09648 09425 09606 09395 08812 08545 09789ACCALL 08803 08028 07746 07324 06761 06197 09296BN blue nevus CN Clark nevus MM malignant melanoma SN Spitz nevus ALL multiclass classification

the presented method of classification yields better resultsthan those obtained during a medical examination

4 Conclusions

This paper presents a computer-aided approach for theautomatic classification of melanocytic lesions and accuratediagnosis of melanoma In our research we evaluated ourapproach with four types of classifiers 119896-nearest neighborsalgorithm logistic regression decision tree and SVM Thedeveloped system achieved 92 accuracy with SVM

Our study gives an important contribution to the researcharea of skin lesion classification for several reasons Firstly itfocuses on specifying the exact type of a skin lesion and notonly on categorization of skin lesions as benign or malignantSecondly this work combines the results of research doneso far related to all the steps needed for the developmentof an automatic diagnostic system for melanocytic lesiondetection and classification Finally the refinement of currentapproaches and development of new techniques andmethodswill help to improve the ability to diagnose skin moles moreprecisely and to achieve the goal of the significant reductionin melanoma mortality rate

41 Future Work Starting from the present frameworkfurther research efforts will be firstly addressed to compareand integrate the very promising approaches and corre-sponding feature descriptors reported in the most recentliterature in order to improve the classification accuracy ofmelanocytic lesions We will also conduct a follow-up studyby collecting more real data especially melanoma cases tofurther evaluate our approach One of the major problems inthe field of melanocytic skin tumors is the underdiagnosisof melanoma as a benign melanocytic or nonmelanocyticlesion To decrease the amount of dermoscopic pitfalls thenumber of melanocytic lesion types will be extended for abetter evaluation of melanocytic lesion

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This scientific work was partly supported by the National Sci-ence Centre based on the Decision no 201101NST706783

References

[1] Skin September 2015 httpsciencenationalgeographiccomsciencehealth-and-human-bodyhuman-bodyskin-article

[2] G Argenziano and H P Soyer Interactive Atlas of DermoscopyBook and CDWeb Resource Edra Medical Publishing and NewMedia Milan Itlay 2000

[3] K Korotkov and R Garcia ldquoComputerized analysis of pig-mented skin lesions a reviewrdquoArtificial Intelligence inMedicinevol 56 no 2 pp 69ndash90 2012

[4] P Soyer G Argenziano R Hofmann-Wellenhof and RH JohrColor Atlas of Melanocytic Lesions of the Skin Springer NewYork NY USA 2007

[5] Melanoma Foundation of New Zealand About melanomakey information 2014 httpwwwmelanomanetworkconzAbout-MelanomaKey-Information

[6] E Stocketh T Rosen and S Shumack Managing Skin CancerSpringer 2010

[7] S W Menzies L Bischof H Talbot et al ldquoThe performanceof SolarScan an automated dermoscopy image analysis instru-ment for the diagnosis of primary melanomardquo Archives ofDermatology vol 141 no 11 pp 1388ndash1396 2005

[8] M Burroni R CoronaGDellrsquoEva et al ldquoMelanoma computer-aided diagnosis reliability and feasibility studyrdquoClinical CancerResearch vol 10 no 6 pp 1881ndash1886 2004

[9] A Masood and A A Al-Jumaily ldquoComputer aided diagnosticsupport system for skin cancer a review of techniques andalgorithmsrdquo International Journal of Biomedical Imaging vol2013 Article ID 323268 22 pages 2013

16 BioMed Research International

[10] M E Celebi W V Stoecker and R H Moss ldquoAdvances inskin cancer image analysisrdquo ComputerizedMedical Imaging andGraphics vol 35 no 2 pp 83ndash84 2011

[11] H Zhou M Chen R Gass et al ldquoFeature-preserving artifactremoval from dermoscopy imagesrdquo in Medical Imaging ImageProcessing vol 6914 of Proceedings of SPIE pp 1ndash9 San DiegoCalif USA February 2008

[12] P Wighton T K Lee and M S Atkins ldquoDermascopic hairdisocclusion using inpaintingrdquo in Medical Imaging Image Pro-cessing vol 6914 ofProceedings of SPIE pp 1ndash8 SanDiego CalifUSA February 2008

[13] Q Abbas I F Garcia M Emre Celebi and W Ahmad ldquoAfeature-preserving hair removal algorithm for dermoscopyimagesrdquo Skin Research and Technology vol 19 no 1 pp e27ndashe36 2013

[14] J Jaworek-Korjakowska ldquoNovel method for border irregularityassessment in dermoscopic color imagesrdquo Computational andMathematicalMethods inMedicine vol 2015 Article ID 49620211 pages 2015

[15] J Jaworek-Korjakowska and R Tadeusiewicz ldquoHair removalfrom dermoscopic color imagesrdquo Bio-Algorithms and Med-Systems vol 9 no 2 pp 53ndash58 2013

[16] M E Celebi H Iyatomi G Schaefer and W V StoeckerldquoLesion border detection in dermoscopy imagesrdquoComputerizedMedical Imaging and Graphics vol 33 no 2 pp 148ndash153 2009

[17] J Jaworek-Korjakowska Analiza i detekcja struktur lokalnych wczerniaku złosliwym [Autoreferat Rozprawy Doktorskiej] AGHUniversity of Science and Technology Krakow Poland 2013

[18] E Zagrouba and W Barhoumi ldquoA prelimary approach for theautomated recognition ofmalignantmelanomardquo ImageAnalysisamp Stereology vol 23 no 2 pp 121ndash135 2004

[19] D Ballardand and C Brown Computer Vision Prentice HallProfessional Technical Reference 1st edition 1982

[20] R M Haralick ldquoA measure for circularity of digital figuresrdquoIEEE Transactions on Systems Man and Cybernetics vol 4 no4 pp 394ndash396 1974

[21] R S Montero and E Bribiesca ldquoState of the art of compactnessand circularity measuresrdquo International Mathematical Forumvol 4 no 27 pp 1305ndash1335 2009

[22] J Smietanski R Tadeusiewicz and E Łuczynska ldquoTextureanalysis in perfusion images of prostate cancermdasha case studyrdquoInternational Journal of Applied Mathematics and ComputerScience vol 20 no 1 pp 149ndash156 2010

[23] J W Choi Y W Park S Y Byun and S W Youn ldquoDifferentia-tion of benign pigmented skin lesions with the aid of computerimage analysis a novel approachrdquo Annals of Dermatology vol25 no 3 pp 340ndash347 2013

[24] M E Celebi H A Kingravi B Uddin et al ldquoA methodologicalapproach to the classification of dermoscopy imagesrdquo Comput-erized Medical Imaging and Graphics vol 31 no 6 pp 362ndash3732007

[25] C Manning P Raghavan and H Schutze Introduction toInformation Retrieval Cambridge University Press 2008

[26] J Shawe-Taylor and N Cristianini Kernel Methods for PatternAnalysis Cambridge University Press Cambridge UK 2004

[27] L Kaufman and P Rousseeuw Finding Groups in Data AnIntroduction to Cluster Analysis Wiley Hoboken NJ USA1990

[28] R M Haralick K Shanmugam and I Dinstein ldquoTexturalfeatures for image classificationrdquo IEEE Transactions on SystemsMan and Cybernetics vol 3 no 6 pp 610ndash621 1973

[29] MM Galloway ldquoTexture analysis using gray level run lengthsrdquoComputer Graphics and Image Processing vol 4 no 2 pp 172ndash179 1975

[30] A Chu C M Sehgal and J F Greenleaf ldquoUse of gray valuedistribution of run lengths for texture analysisrdquo Pattern Recog-nition Letters vol 11 no 6 pp 415ndash419 1990

[31] B V Dasarathy and E B Holder ldquoImage characterizationsbased on joint gray level-run length distributionsrdquo PatternRecognition Letters vol 12 no 8 pp 497ndash502 1991

[32] H Ganster A Pinz R Rohrer E Wildling M Binder and HKittler ldquoAutomated melanoma recognitionrdquo IEEE Transactionson Medical Imaging vol 20 no 3 pp 233ndash239 2001

[33] M DrsquoAmico M Ferri and I Stanganelli ldquoQualitative asym-metry measure for melanoma detectionrdquo in Proceedings of theIEEE International Symposium on Biomedical Imaging Nano toMacro vol 2 pp 1155ndash1158 Arlington Va USA April 2004

[34] W V Stoecker W W Li and R H Moss ldquoAutomatic detectionof asymmetry in skin tumorsrdquo Computerized Medical Imagingand Graphics vol 16 no 3 pp 191ndash197 1992

[35] U Fayyad and K Irani ldquoMulti-interval discretization ofcontinuous-valued attributes for classification learningrdquo inProceedings of the 13th International Joint Conference on ArticialIntelligence vol 2 pp 1022ndash1029 Morgan Kaufmann 1993

[36] S Aksoy and R M Haralick ldquoFeature normalization andlikelihood-based similarity measures for image retrievalrdquo Pat-tern Recognition Letters vol 22 no 5 pp 563ndash582 2001

[37] V Ng and D Cheung ldquoMeasuring asymmetries of skin lesionsrdquoin Proceedings of IEEE International Conference on SystemsMan and Cybernetics vol 5 pp 4211ndash4216 IEEE PressOrlando Fla USA October 1997

[38] Z She Y Liu and A Damatoa ldquoCombination of features fromskin pattern and ABCD analysis for lesion classificationrdquo SkinResearch and Technology vol 13 no 1 pp 25ndash33 2007

[39] B Kusumoputro and A Ariyanto ldquoNeural network diagnosisof malignant skin cancers using principal component analysisas a preprocessorrdquo in Proceedings of the IEEE International JointConference on Neural Networks and the IEEE World Congresson Computational Intelligence vol 1 pp 310ndash315 IEEE PressAnchorage Alaska USA May 1998

[40] E Claridge P N Hall M Keefe and J P Allen ldquoShape analysisfor classification ofmalignantmelanomardquo Journal of BiomedicalEngineering vol 14 no 3 pp 229ndash234 1992

[41] Y Cheng R Swamisai S E Umbaugh et al ldquoSkin lesionclassification using relative color featuresrdquo Skin Research andTechnology vol 14 no 1 pp 53ndash64 2008

[42] K Cheung Image processing for skin cancer detection malignantmelanoma recognition [PhD thesis] University of Toronto 1997

[43] T K Lee D I McLean and M S Atkins ldquoIrregularity indexa new border irregularity measure for cutaneous melanocyticlesionsrdquoMedical Image Analysis vol 7 no 1 pp 47ndash64 2003

[44] T K Lee and E Claridge ldquoPredictive power of irregular bordershapes for malignant melanomasrdquo Skin Research and Technol-ogy vol 11 no 1 pp 1ndash8 2005

[45] Y Chang R J Stanley R H Moss andW Van Stoecker ldquoA sys-tematic heuristic approach for feature selection for melanomadiscrimination using clinical imagesrdquo Skin Research and Tech-nology vol 11 no 3 pp 165ndash178 2005

[46] A J Round AW G Duller and P J Fish ldquoLesion classificationusing skin patterningrdquo Skin Research and Technology vol 6 no4 pp 183ndash192 2000

BioMed Research International 17

[47] Z She and P Fish ldquoAnalysis of skin line pattern for lesionclassificationrdquo Skin Research and Technology vol 9 no 1 pp73ndash80 2003

[48] R P Walvick K Patel S V Patwardhan and A P DhawanldquoClassification of melanoma using wavelet-transform-basedoptimal feature setrdquo in Proceedings of the Medical ImagingImage Processing vol 5370 of Proceedings of SPIE pp 944ndash951May 2004

[49] W V Stoecker C-S Chiang and R H Moss ldquoTexture in skinimages comparison of three methods to determine smooth-nessrdquo Computerized Medical Imaging and Graphics vol 16 no3 pp 179ndash190 1992

[50] S V Deshabhoina S E Umbaugh W V Stoecker R H Mossand S K Srinivasan ldquoMelanoma and seborrheic keratosisdifferentiation using texture featuresrdquo Skin Research and Tech-nology vol 9 no 4 pp 348ndash356 2003

[51] G M Weiss ldquoMining with rarity a unifying frameworkrdquo ACMSIGKDD Explorations Newsletter vol 6 no 1 pp 7ndash19 2004

[52] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

[53] H Liu and L Yu ldquoToward integrating feature selection algo-rithms for classification and clusteringrdquo IEEE Transactions onKnowledge and Data Engineering vol 17 no 4 pp 491ndash5022005

[54] I T Jolliffe Principal Component Analysis Springer New YorkNY USA 1986

[55] I Kononenko and E Simec ldquoInduction of decision treesusing RELIEFFrdquo in Proceedings of the ISSEK94 Workshop onMathematical and Statistical Methods in Artificial Intelligencevol 363 of International Centre forMechanical Sciences pp 199ndash220 Springer Vienna Austria 1995

[56] R Battiti ldquoUsing mutual information for selecting features insupervised neural net learningrdquo IEEE Transactions on NeuralNetworks vol 5 no 4 pp 537ndash550 1994

[57] M A Hall ldquoCorrelation-based feature selection for discreteand numeric class machine learningrdquo in Proceedings of the 17thInternational Conference on Machine Learning (ICML rsquo00) pp359ndash366 Morgan Kaufmann 2000

[58] E E Ghiselli Theory of Psychological Measurement McGraw-Hill New York NY USA 1964

[59] I H Witten and E Frank Data Mining Practical MachineLearning Tools and Techniques Morgan Kaufmann San Fran-cisco Calif USA 2005

[60] S Dreiseitl L Ohno-Machado H Kittler S Vinterbo HBillhardt and M Binder ldquoA comparison of machine learningmethods for the diagnosis of pigmented skin lesionsrdquo Journal ofBiomedical Informatics vol 34 no 1 pp 28ndash36 2001

[61] A Glowacz A Glowacz and P Korohoda ldquoRecognition ofmonochrome thermal images of synchronous motor with theapplication of binarization and nearestmean classifierrdquoArchivesof Metallurgy and Materials vol 59 no 1 pp 31ndash34 2014

[62] R Tadeusiewicz How Intelligent Should be System for ImageAnalysisvol 339 of Studies in Computational IntelligenceSpringer Berlin Germany 2011

[63] L Jost ldquoEntropy and diversityrdquo Oikos vol 113 no 2 pp 363ndash375 2006

[64] K-B Duan and S S Keerthi ldquoWhich is the bestmulticlass SVMmethod An empirical studyrdquo inMultiple Classifier Systems 6thInternational Workshop MCS 2005 Seaside CA USA June 13ndash15 2005 Proceedings vol 3541 of Lecture Notes in ComputerScience pp 278ndash285 Springer Berlin Germany 2005

[65] C Hsu and C Lin ldquoA comparison of methods for multi-class support vector machinesrdquo IEEE Transactions on NeuralNetworks vol 13 no 2 pp 415ndash425 2002

[66] A A Safi Towards computer-aided diagnosis of pigmented skinlesionson [PhD thesis] Technische Universitat Munchen 2012

[67] C-W Hsu C-C Chang and C-J Lin ldquoA practical guide tosupport vector classificationrdquo Tech Rep National Taiwan Uni-versity 2003 httpwwwcsientuedutwsimcjlinpapersguideguidepdf

[68] A M Molinaro R Simon and R M Pfeiffer ldquoPrediction errorestimation a comparison of resamplingmethodsrdquo Bioinformat-ics vol 21 no 15 pp 3301ndash3307 2005

[69] R Bouckaert ldquoChoosing between two learning algorithmsbased on calibrated testsrdquo in Proceedings of the 20th Interna-tional Conference on Machine Learning (ICML rsquo03) T Fawcettand N Mishra Eds pp 51ndash58 AAAI Press Washington DCUSA 2003

[70] P Flach Machine Learning The Art and Science of AlgorithmsThat Make Sense of Data Cambridge University Press NewYork NY USA 2012

[71] P Rubegni G Cevenini M Burroni et al ldquoAutomated diagno-sis of pigmented skin lesionsrdquo International Journal of Cancervol 101 no 6 pp 576ndash580 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 6: Research Article Automatic Classification of Specific ...downloads.hindawi.com/journals/bmri/2016/8934242.pdfmost popular melanocytic nevi to the diagnostic categories including Clark

6 BioMed Research International

(a) (b)

(c) (d)

(e) (f)

Figure 8 Outcome of the preprocessing step (a) input image (b) black frame removal (c) grayscale conversion (d) top-hat transform andbinarization process (e) hair distinction from other structures and (f) inpainting

Among the most necessary artifact rejection steps is hairremoval because hairs may cover parts of the image andmake the segmentation and texture analysis impossible Anumber of methods have been developed for hair removalin dermoscopic images and they were mostly based onmorphological operations and adaptive thresholding [11ndash13]A good approach for hair removal is the use of top-hattransformationThe process consists of four steps convertingRGB to grayscale image applying black top-hat transforma-tion distinguishing hairs from other local structures andinpainting

The dermoscopic RGB image is being converted intograyscale with the NTSC 1953 standard (Figure 8(c)) Sec-ondly the black top-hat transform which is a morphologicalimage processing technique is used to detect thick darkhairs The result of this step is the difference between theclosing operation and the input image

119879119908(119868) = 119868 ∘ 119887 minus 119868 (1)

where ∘ denotes the closing operation 119868 is the grayscale inputimage and 119887 is a grayscale structuring element [14 15]

The black top-hat transform returns an image containingelements that are darker than their surrounding and smallerthan the structuring element

These steps have been precisely described in our work [1415] Figure 8 presents the outcome after each step

22 Segmentation Algorithm The techniques and algorithmsavailable for segmentation of medical structures are specificto application imaging modality and type of body part to bestudied For the dermoscopic images segmentation processis one of the most challenging and crucial processes Thisprocess for dermoscopic images is extremely difficult dueto several factors low contrast between the healthy skinand moles variegate coloring inside lesions and irregularborders as well as different artifacts Due to the difficultiesdescribed above numerousmethods have been implementedand tested [3] Celebi et al present in their research [16]

BioMed Research International 7

Health skin segmentation

(a) (b) (c)

(d) (e)

Figure 9 Results of the segmentation process (andashd) segmentation of the healthy skin with the region-growing algorithm (e) segmentedarea

the state of the art of segmentation methods and comparethem with the statistical region merging as a recent colorimage segmentation technique based on region growingand merging Based on the achieved results and conductedexperiments the skin lesion extraction is performed byseeded region-growing algorithm [17] in regard to twoaspects Firstly during the preprocessing step the healthyskin becomes homogeneous Secondly the whole skinmole isvisible in the dermoscopic image and surrounded by healthyskin It means that the healthy skin surrounds the moleRegion-growing techniques generally give better results innoisy images where edges are extremely difficult to detect

For the skin lesion segmentation we take one seed whichis located in the left upper corner of the image (Figure 9(a))The region is iteratively grown by comparing all unallocatedneighboring pixels to the regionThe region-growing processconsists of picking a seed from the set investigating all4-connected neighbors of this seed and merging suitableneighbors to the seed The seed is then removed and allmerged neighbors are added to the seed set The region-growing process continues until the seed set is empty

These steps have been precisely described in our work[14] In Figure 9 we present the results of the segmentationstep for several iterations of the region-growing algorithm

23 Geometrical Feature Extraction Geometrical featureshave been used mainly to describe lesionrsquos outline as itsirregularity usually indicates malignancy Those features arebased mostly on such properties of an object as areadiameters or geometric moments To ensure robustness ofthe selected features their formulation does not dependdirectly on objectrsquos perimeter (as in case of raster images itis hard to accurately estimate this quantity) All features arenormalized to ensure their scale- rotation- and translation-invariance

To assess lesionrsquos shape the following features havebeen extracted maximal diameter (maximum distancebetween two arbitrary points) equivalent diameter (119863eq =

radic(4120587) sdot Area) variance of the radial distance distributionrectangularity (ratio of area of an object to area of itsbounding box) elongation (aspect ratio of objectrsquos boundingbox) eccentricity Haralickrsquos compactness and normalizeddiscrete compactness [18ndash21]

Variance of the radial distance distribution is given by [18]

119904119889

2

=1

card (119861)sum119901isin119861

(119889 (119901 119862) minus 119889 (119901 119862))2

119889 (119901 119862)2

(2)

where 119861 is a set of all pixels constituting perimeter of object119874 119862 is geometric centroid of 119874 and 119889(119886 119887) is the distancebetween pixels 119886 and 119887 in a chosen metric As studied lesionsare of circular shape the Euclidean metric has been chosen

Eccentricity measures deviation of a conic curve from acircle and (in mathematics) is formulated as ratio of distancebetween foci of an ellipse to the length of its major axisIn image processing eccentricity of an object 119874 is actuallyvalue of eccentricity of an ellipse with same second momentsas object 119874 [19]

120576 =(11989802minus 11989820)2

+ 411989811

(11989802+ 11989820)2

(3)

where 119898119901119902

denotes an image moment of (119901 119902) order How-ever such a formulation of eccentricity still serves the pur-pose of measuring circularity of an object

Haralickrsquos compactness is a measure for circularity of adigital figure defined as 120583

119877120590119877 where 119877 is a random variable

of the distance between the center of the figure to any part ofits perimeter [20]

8 BioMed Research International

Normalized discrete compactness 119862119863119873

measure is basedon counting the number of cell sides common to adjacentpixels of objectrsquos perimeter a measure called discrete com-pactness 119862

119863[21]

119862119863119873

=119862119863minus 119862119863min

119862119863max minus 119862

119863

=119875 minus 2119899 + 2

119875 + 4radic119899

119862119863=4119899 minus 119875

2 119862119863min = 119899 minus 1 119862

119863max =4119899 minus 4radic119899

2

(4)

where119862119863min119862119863max are respectively lower and upper bound

of discrete compactness of a shape composed of 119899 pixels and119875 is the perimeter of the digital region [21]

24 Color-Based Features The analysis of lesionrsquos colors is animportant source of information when determining lesionrsquostype as malignant lesions are characterized by a rich texture[22]

The following color-based features have been extractednumber of colors present within lesionrsquos area (together withinformation about the presence of two specific colors whiteand black) concentricity centroid distance and 119871

lowast

119886lowast

119887lowast

histogram distances [23 24]To determine the number of colors and to calculate the

concentricity an image has been converted to CIE 119871lowast

119886lowast

119887lowast

color space and then 119886lowast and 119887

lowast color channels have beenclustered into four clusters Three clustering algorithms havebeen tested 119896-means clustering kernel 119896-means clusteringand hierarchical agglomerative clustering [25 26] Kernel119896-means algorithm has been tested using Gaussian kernelfor 120590 isin 1 2 [26] In case of hierarchical agglomerativeclustering Wardrsquos minimum variance method based onEuclidean distance between clusters has been applied as thecriterion for choosing the pair of clusters to merge at eachstep [27]

The number of colors has been related to the greatestdistance between clustersrsquo centroids (each pair of clusters hadbeen considered) and it has been assumed that to identify twocolors as significantly different the distance between them in119886lowast

119887lowast space must exceed certain threshold value

119899colors = max(1 lfloor1120591max (119863)rfloor)

119863 = 119889119864(1198881 1198882) 1198881 1198882isin 119862 and 119888

1= 1198882

(5)

where 119862 is a set of clustersrsquo centroids The threshold value 120591has been set to 120591 = 12

It has been assumed that to state the presence of a certaincolor within the lesion the area of biggest cohesive area ofthat color must be greater than 1 of the area of the wholelesion Pixels have been recognized as white if 119871lowast gt 65 and asblack if 119871lowast lt 15 where 119871lowast is the value of luminosity channelfor the given pixel

Based on the aforementioned clusterization a measurecalled ldquoconcentricityrdquo is derived [23] To compute concen-tricity of an object its color clusters are first arranged inascending order regarding the area of their convex hull(ie segment 119878

1has the smallest area of convex hull and

segment 1198784the greatest) and then a few auxiliary indexes are

calculated The 119899(sdot) function stands for the number of pixelsof the given object

For segment 1198781 the percent area index PA is computed

[23]

PA =119899 (119862119862max)

119899 (1198781)

(6)

where 119862119862max is the largest cohesive set of pixels of 1198781 The PAdescribes how well the core is grouped

Let Hull denote the minimal convex area which includesthe entire pixels of 119878

2 The core inclusion CI is then defined

as [23]

CI =119899 (1198781capHull)

119899 (1198781)

(7)

The CI describes how well the second smallest area encirclesthe core

Let Core denote the minimal convex area which includesthe entire pixels of 119878

1 The hull exclusion HE is given by [23]

HE = 1 minus119899 (1198782cap Core)

119899 (1198782)

(8)

TheHEdescribes how exclusively the hull surrounds the coreFinally concentricity is defined as the product of the three

variables [23]

Concentricity = PA times CI timesHE

Concentricity isin [0 1]

(9)

Concentricity closer to one implies a better concentric struc-ture

Images of all regions used to compute concentricity havebeen compiled in Figure 10

The centroid distance for a color channel is defined asthe distance between the geometric centroid (of the binaryobject) and the brightness centroid of that channel Thebrightness centroid may be considered a center of mass ofan object whose density is determined by intensity valuesof its pixels If the pigmentation in a particular channel ishomogeneous the brightness centroid will be close to thegeometric centroid resulting in small value of the centroidaldistance

Color similarity of two regions has been quantified by the1198711- and 119871

2-norm histogram distances for CIE 119871

lowast

119886lowast

119887lowast color

space coarsely quantized into 4 times 8 times 8 bins [24]

1198711(119867119860 119867119861) =

4times8times8

sum

119894=1

1003816100381610038161003816119867119860 (119894) minus 119867119861(119894)1003816100381610038161003816

1198712(119867119860 119867119861) =

4times8times8

sum

119894=1

(119867119860(119894) minus 119867

119861(119894))2

(10)

where119867119860119867119861are histograms of areas 119860 and 119861 respectively

BioMed Research International 9

Object S1 S2 CCmax

Core S1 cap core Hull S2 cap hull

Figure 10 Regions used to compute concentricity of an object

25 Texture Description Quantitative properties of lesionrsquostexture were described with measures based on a gray levelcooccurrence matrix (GLCM) and a gray level run-lengthmatrix (GLRLM) [28 29]

GLCM is a square matrix 119875 where the (119894 119895)th entry of 119875represents information about the frequency of occurrence ofsuch two adjacent pixels where one of them has intensity 119894

and another has intensity 119895 Such informationmay be used toderive second-order statistical measurements characterizingexamined texture GLCM and six measures derived from it(contrast correlation energy homogeneity maximum prob-ability and dissimilarity) have been calculated as describedby Celebi et al [24]

Contrast = sum

119894119895

(119894 minus 119895)2

119875119894119895

Correlation = minussum

119894119895

(119894 minus 120583119909) (119895 minus 120583

119910)

120590119909120590119910

119875119894119895

Energy = sum

119894

sum

119895

[119875119894119895]2

Homogeneity = sum

119894

sum

119895

119875119894119895

1 + (119894 minus 119895)2

Maximum Probability = max119894119895

119875119894119895

Dissimilarity = sum

119894

sum

119895

119875119894119895sdot1003816100381610038161003816119894 minus 119895

1003816100381610038161003816

(11)

With GLRLM it is possible to analyze higher orderstatistical features for the given texture GLRLM is a two-dimensional matrix in which each element 119901(119894 119895 | 120579) gives

the total number of occurrences of runs of length 119895 at graylevel 119894 in a given direction 120579

In our study the direction of runs is irrelevant asdermoscopic camera has no fixed orientation when takingimages of lesionsmdashthere is no ldquoreference orientationrdquo Theonly thing that matters is texturersquos homogeneity thus insteadof calculating measures for individual orientations 120579 isin

0∘

45∘

90∘

135∘

all GLRLM computed for the aforemen-tioned orientations have been added together and mea-sures have been computed only using this new orientation-invariant matrix

GLRLM is used to derive 11measures Five basicmeasures(SRE LRE GLN RLN and RP) describe the distribution ofrunsrsquo lengths (SRE LRE RLN and RP) and runsrsquo intensity(GLN) [29] As SRE and LRE do not consider pixelsrsquo intensityLGRE andHGREmeasures have been proposed [30] FinallySRLGE SRHGE LRLGE and LRHGEmeasures are based onstatistical properties of a joint probability distribution of bothrunsrsquo intensity and length [31]

SRE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

119894119895

1198952

LRE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198952

119894119895

GLN =1

119899119903

119866

sum

119894=1

(

119877

sum

119895=1

119894119895)

2

RLN =1

119899119903

119877

sum

119895=1

(

119866

sum

119894=1

119894119895)

2

10 BioMed Research International

RP =119899119903

119899

LGRE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

119894119895

1198942

HGRE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198942

119894119895

SRLGE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

119894119895

1198942 sdot 1198952

SRHGE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198942

119894119895

1198952

LRLGE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198952

119894119895

1198942

LRHGE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198942

1198952

119894119895

(12)

where 119899 is the number of image pixels with 119866 gray levels 119877 isthe maximal run length and 119899

119903is the total number of runs in

an image (119899119903= sum119866

119894=1sum119877

119895=1119894119895)

26 Asymmetry The observed asymmetry is a significantdiagnostic premise as in malignant lesions the arrangementof local structures (eg dots streaks and pigmentation nets)is nonuniform across the whole area of a lesion [32] Someresearchers point out that for instance sharp transitionsbetween central and border area indicatemalignancy [24 33]

Asymmetry measures may be defined in terms of ratiosof feature values across various lesionrsquos area segments or interms of changes of feature values between halves obtainedby splitting lesion area into halves using straight section linespassing through the center of mass [32 33]

In our study a different approach has been adopted Thelesion area has been first divided into sets of subregions inthree differentmanners (Figure 11) by splitting (1) into centraland border part (2) into halves along minor and major axisand (3) into quarters using same axes Then a variance offeature values has been calculated for each set of subregionsAdditionally asymmetry indices proposed by Stoecker et al[34] and the following quantities have been computed

119877119876119894

=119891119876119894

119891119882

(13)

where 119891119876119894

denotes value of feature 119891 calculated for 119894thquadrant and 119891

119882for the whole area respectively

27 Data Preparation In order to apply correlated-basefeature selection method (the choice of this particularmethod is discussed later) numerical features have beendiscretizedThis is due to the fact that in the studied problem

the decision feature is a categorical variable which meansit takes on discrete values and the relationship betweenvariables of those two classes cannot be directly assessedTheonly solution is to discretize the variable taking continuousvalues In general terms discretization is a simple logicalcondition which takes into account one or a few attributesand which aims at splitting a range of data into at least twosubsets Data in our study have been discretized accordingto the method proposed by Fayyad and Irani [35] whichuses heuristics minimizing entropy based on the ldquominimumdescription length principlerdquo

To ensure correct work of algorithms based on analysis ofdistances between points in the feature space (eg 119896-nearestneighbors algorithm SVM) for each feature its values havebeen scaled to the range of [minus1 1] Consequently the riskof a situation in which a feature with larger range of valuesdominates other features has been eliminated Normallydistributed features have been transformed using 119885-score(Equation (14)) whereas other features were rescaled linearlyto the desired range (15) [36] Consider

x =x minus x3119904

where x =1

119873sum

119894

x119894 119904 =

1

119873 minus 1(sum

119894

(x119894minus 120583)2

)

(14)

x = 2x119894minus xmin

xmax minus xminminus 1 (15)

where x is a vector of featurersquos values prior to normalizationand x is the same vector after normalization To determineif a given feature is well modeled by a normal distributionchi-square goodness-of-fit test has been applied

28 Class Imbalance Problem The data set used in thisstudy exhibits class imbalance problem which means thatthe classes are not approximately equally represented (ie itconsists of more than three times as many images of Clarknevus than images of blue nevus) In such case most clas-sifiers will focus on learning how to identify representativesof majority classes leading to poor predictive accuracy forminority classes When making a diagnosis such a situationis unacceptable as misclassification error may lead even topatientrsquos death

The most common way to address this issue is sampling[51] There exist two main types of sampling methodsundersampling which consists in removing representativesof the majority class and oversampling which consists inadding examples of the minority class Out of many samplingtechniques two methods are particularly popular randomundersampling and synthetic minority oversampling technique(SMOTE) In random undersampling method randomlydrawn representatives of the majority class are removed fromthe data set In SMOTE method new ldquosyntheticrdquo examplesare created by averaging a few representatives of the sameminority classes which are nearest neighbors in the featurespace [52]

BioMed Research International 11

(a) Whole area

(b) Inner and outer area

(c) Split along the major axis (d) Split along the minor axis

(e) Split along both major and minor axis simultaneously

Figure 11 A sample division of an object into subregions

In this study SMOTE method has been used as it hasbeen proven to be more effective than random undersam-pling when tested on a set of dermoscopic images obtainedfrom various sources without any common or rare lesiontypes omitted [24]

29 Feature Selection Feature selection consists in reducingdata dimensionality by rejecting redundant unimportant ornoisy features thus resulting in increased prediction accu-racy less complex classifier models and better computationefficiency

Feature selection algorithms may be divided into threemain categories filters wrappers and projections [53 54] Inthis study filter methods have been used for two reasons[24] Firstly as filters are usually fast it is possible to test

a huge number of combinations of features Secondly if onewould like to use wrappers on a given data set the targetlearning algorithm should demonstrate satisfactory resultsfor the original data set as wrappers are based on feedbackprinciple As some features extracted in this study might beirrelevant or redundant as well as due to class imbalancewrappers have not been likely to fulfill those restrictionsOn the other hand projections are used mainly to increasecomputational performance which was not the case in thisstudy

Out of numerous available filters three areworth noticingdue to their satisfactory performance on various data sets[53] ReliefFmutual information-based feature selection andcorrelation-based feature selection [55ndash57] As numerous fea-tures extracted in this study are strongly mutually correlated

12 BioMed Research International

correlation-based feature selection has been chosen as theonly method which takes into account not only relationshipsbetween features and the decision class but also relationshipsbetween features themselves

The heuristic ldquomeritrdquo Merit119878of a feature subset 119878 contain-

ing 119896 features is given by [58]

Merit119878=

119896119903cf

radic119896 + 119896 (119896 minus 1) 119903ff (16)

where 119903cf is the average feature-class correlation and 119903ff theaverage feature-feature intercorrelation

To compare two feature vectors of discrete values thesymmetric uncertainty an improved version of informationgain measure has been used [59] Symmetric uncertainty isgiven by

SU (119883 119884) = 2 [119867 (119883) + 119867 (119884) minus 119867 (119883 119884)

119867 (119884) + 119867 (119883)] (17)

where 119867(119860) is the marginal entropy of set 119860 and 119867(119860 119861) isthe joint entropy of sets 119860 and 119861

The number of selected features is a parameter whichrequires a careful tuning too few features results may pre-vent classifiers from distinguishing between various classeswhereas too many features impose risk of overfittingmdashasituation when a model excels in classifying training databut fails to generalize knowledge and hence misclassifies newsamples As some researchers suggest limiting the number offeatures 119896 to the range of 5 le 119896 le 30 [24] in this study 119896 = 20

has been adoptedThe applied procedure allowed choosing features which

best discriminate lesion types concentricity computed using119896-means algorithm concentricity computed using kernel 119896-means algorithm for 120590 = 1 119871lowast119886lowast119887lowast histogram distancesin 1198711and 119871

2metrics computed for central and border

part variances from sets of 119871lowast

119886lowast

119887lowast histogram distances

between the whole region and individual quarters in both1198711and 119871

2metrics variances from sets of 119871lowast119886lowast119887lowast histogram

distances between each pair of quarters in both 1198711and 119871

2

metrics variances from sets of GLCM features (dissimilar-ity maximum probability entropy energy correlation andcontrast) computed for quarters variance of variances of119886lowast channel values (in 119871

lowast

119886lowast

119887lowast color space) calculated for

quarters variance of variances of H channel values (in HSVcolor space) calculated for quarters mean value of variancesof 119886lowast channel values (in 119871

lowast

119886lowast

119887lowast color space) for quarters

variance from sets of 119871lowast119886lowast119887lowast histogram distances betweenhalves in both 119871

1and 119871

2metrics and variance from sets of

GLCM dissimilarity computed for halvesThis particular choice of features does not diverge from

the clinical practice according to which asymmetry is one ofthe most important premises in determining the lesion type

210 Classification

2101 Models The following predictive models have beentested [60 61] 119896-nearest neighbors algorithm (for 119896 isin

3 5 10 15) logistic regression decision tree and supportvector machine (SVM) Other models like neural networks

or rule-based systems have not been a subject of this study[62] In case of the decision tree Gini-Simpson index hasbeen used as a measure of data diversity [63] Decisionboundary of SVMs has been determined using radial basisfunction as kernels Radial basis function has been preferredover linear sigmoid and polynomial kernels as it exhibits afew important properties [24] it allows classifying nonlin-early separable data sets (as is the case in this study) it ischaracterized by high numerical stability and it is definedusing only one parameter Moreover as in this study there areas many as four decision classes the multiclass classificationproblem has been decomposed into a (greater) number ofbinary classification problems Winner-takes-all strategy hasbeen applied to combine solutions of subproblems into afinal solution as for small data sets its results are similarto the results of the best strategymdashpairwise couplingmdashwhileat the same time its computational burden is much lower[64 65]

Individual classifiers have been trained using typicalparameters It has been assumed that the cost of misclassify-ing melanoma as nevi is four times higher than another sortof misclassification Initial experimental results have provenSVM to be the most effective classifier (similar observationshave been made by other research teams [60 66]) Con-sequently further experiments were focused on improvingperformance of SVMs by fine-tuning their parameters

SVMwith radial basis function kernel is governed by twoparametersmdashmisclassification cost 119862 and kernel width 120574The task is to select those parameters in such a way that theaccuracy of predictions for new data (data not used in thelearning process) is maximal

As values of only two parameters have had to be adjustedgrid search has been applied [67] For each parametervalues from exponential series have been considered 119862 isin

2minus5

2minus3

215

and 120574 isin 2minus15

2minus13

23

Each com-bination of parameter values (119862

0 1205740) has been assessed

using a validation procedure described below After the gridsearch had finished SVM have been trained using optimalparameter values (119862lowast 120574lowast)

2102 Validation The aim of the validation step is to assessclassifierrsquos quality based on the number of generalizationerrors that is misclassified samples Optimal SVM modelshave been validated using stratified Monte Carlo cross vali-dation method and in each iteration the test set consisted of10 of samples

The Monte Carlo variant of cross validation had beenchosen for a number of reasons Firstly experimental resultsobtained for a similar problem [24] suggest high effectivenessof this method Secondly in their analysis Molinaro et al[68] point out that for datasets consisting of few sampleswith many features such as a dataset used in this studythe difference in effectiveness between Monte Carlo andldquoclassicrdquo 10-fold cross validation is insignificant The numberof samples drawn into the test set has been determined usinga ldquorule of thumbrdquo [69] Finally by applying Monte Carlocross validation one may avoid constructing overdevel-oped predictive models which decreases the risk of overfit-ting

BioMed Research International 13

As in the data set used in this study there is a considerabledifference in the number of representatives between variousdecision classes and stratification has been applied Stratifica-tion ensures the similar distribution of representatives of eachdecision class in both training set and test set in each iterationof the validation procedure

3 Results and Discussion

31 Database Specification The described algorithm for theautomatic classification of specific melanocytic lesions hasbeen tested on dermoscopic images from a widely usedInteractive Atlas of Dermoscopy [2] Images for this atlashave been provided by two university hospitals (Universityof Naples Italy and University of Graz Austria) and storedon a CD-ROM in the JPEG format The documentation ofeach dermoscopic image was performed using a Dermaphotapparatus (Heine Optotechnik Herrsching Germany) anda photo camera (Nikon F3) mounted on a stereomicroscope(Wild M650 Heerbrugg AG Switzerland) in order to pro-duce digitized ELM images of skin lesions All the imageshave been assessed manually by a dermoscopic expert withan extensive clinical experience

Furthermore all the descriptions of skin cases were basedon the histopathological examination of the biopsy materialIn order to develop and test the automatic procedure for theclassification of melanocytic skin lesions 300 images withdifferent resolutions ranging from 0033 to 05mmpixelwere chosen The database included 100 Clark nevus cases70 blue nevus cases 70 Spitz nevus cases and 60 malignantmelanomas

The preprocessing step (black frame removal and hairremoval) as well as the segmentation step (border error lessthan 6) did not affect the further research [14]

32 Statistical Analysis The performance of a classifier canbe assessed based on the analysis of discrepancies in classifi-cation that is differences between the classification carriedout by the classifier and the actual classification (groundtruth) summarized in a confusion matrix In this matrixeach row refers to actual classes 119888(119909) as recorded in the testset and each column refers to classes as predicted by theclassifier (119909) The (119894 119895)th element contains the number oftest instances predicted by a classifier to belong to 119895th classclass whereas they are actually representatives of 119894th class

From a contingency table we can calculate three perfor-mance indicators accuracy true positive rate (TPR) and truenegative rate (TNR)

For a binary classification problem those indicators aregiven by [70]

Accuracy = 1

Tesum

119909isinTe119868 [ (119909) = 119888 (119909)]

TPR =sum119909isinTe 119868 [ (119909) = 119888 (119909) = oplus]

sum119909isinTe 119868 [119888 (119909) = oplus]

TNR =sum119909isinTe 119868 [ (119909) = 119888 (119909) = ⊖]

sum119909isinTe 119868 [119888 (119909) = ⊖]

(18)

where Te is the test set and the function 119868[sdot] denotes theindicator function Positive and negative classes are denotedby oplus and ⊖ respectively

Although the initial problem is a multiclass classificationproblem (with four classes) it can be decomposed into fourbinary classification problems whether an instance belongsto the given class or not

When dealing with a data set exhibiting class imbalanceproblem the accuracy should be considered only a prelim-inary performance indicator For instance for a data setwith class ratio 99 1 a classifier which simply counts eachinstance as a representative of the majority class would scorethe accuracy of 99 Therefore in such cases better measurewould be a plot of the ROC curve and the area under thatcurve

The ROC plot completely visualizes the (normalized)contingency table by means of plot in the unit squarewith TPR rate on the 119910-axis and TNR rate on the 119909-axisEach of the points marked on the ROC plot specifies theclassification performance in terms of true and false positiverates achieved by the corresponding score thresholds In ourstudy posterior probabilities have been used as scores Thosepoints are connected by straight lines producing a piecewiselinear curve the ROC curve that rises monotonically from(0 0) to (1 1) The ROC plot can be interpreted as a plotof costs-to-benefits ratio To measure the performance of aclassifier using a ROC plot the area under that plot curve(AUC) is calculated

Values of aforementioned performance indicators forall examined classifiers have been summarized in Table 2Additionally Figure 12 presents ROC plots for a malignant-or-benign classification

Among all examined classifiers SVM achieved definitelybest results It achieved highest overall accuracy ACCALL =

09296 whereas for the second-best classifier the logisticregression ACCALL = 08028 The SVM outperformed thelogistic regression in classifying all four lesion types It isto be stressed that the SVM turned out to be the mosteffective classifier in recognizing malignant melanoma themost dangerous of all four lesions Its true positive rate andarea under the curve amounted to TPRMM = 08611 andAUCMM = 09688 respectively A little bit lower true negativerate for the SVM than for the decision tree is not a problemin the studied area It indicates misclassification of Clarkor Spitz nevi as malignant melanoma and the cost of sucha misdiagnosis is much much lower than of classifying amalignant tumor as a benign one

The achieved results are slightly better thanmost results ofmalignant-or-benign studies conducted so far Our methodallowed classifying malignant lesions with TPRMM = 08611

and TNRMM = 09623 whereas for other similar studies TPRfalls within the range 08ndash10 andTNRndash05ndash095 respectively[66 71]

A truly reliable diagnosis can be made only by ahistopathological examination of a lesion When in theirstudies Curley et al asked three experienced dermatologiststo classify lesions based only on the visual examination thosephysicians identified correctly only half of cases Therefore

14 BioMed Research International

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(a) Logit (AUC = 09469)

0

02

04

06

08

1

TPR

02 04 06 08 101 minus TNR

(b) Tree (AUC = 09555)

0

02

04

06

08

1

TPR

104 06 080201 minus TNR

(c) 3NN (AUC = 09152)

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(d) 5NN (AUC = 08746)

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(e) 10NN (AUC = 07860)

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(f) 15NN (AUC = 07378)

02 04 06 080 11 minus TNR

0

02

04

06

08

1

TPR

(g) SVM (AUC = 09688)

Figure 12 ROC plots for individual classifiers for a malignant-or-benign classification

BioMed Research International 15

Table 2 The assessment of classifiers for a 1-vs-all and overall classification

Measure Logit Tree 3NN 5NN 10NN 15NN SVMTPRBN 09730 08485 07200 07660 06939 06735 10000TNRBN 10000 09266 10000 10000 09785 09677 10000AUCBN 10000 09675 09914 09889 09679 09659 10000TPRCN 08611 06977 08519 07586 08148 08182 09143TNRCN 09528 09394 08870 08761 08783 08500 09626AUCCN 09817 09279 09615 09089 09099 09030 09851TPRMM 08387 09655 07419 06786 06071 05185 08611TNRMM 09189 09381 08919 08596 08421 08174 09623AUCMM 09469 09555 09152 08746 07860 07378 09688TPRSN 08421 07568 08235 07105 06053 05227 09429TNRSN 09712 09333 09352 09231 08846 08776 09813AUCSN 09648 09425 09606 09395 08812 08545 09789ACCALL 08803 08028 07746 07324 06761 06197 09296BN blue nevus CN Clark nevus MM malignant melanoma SN Spitz nevus ALL multiclass classification

the presented method of classification yields better resultsthan those obtained during a medical examination

4 Conclusions

This paper presents a computer-aided approach for theautomatic classification of melanocytic lesions and accuratediagnosis of melanoma In our research we evaluated ourapproach with four types of classifiers 119896-nearest neighborsalgorithm logistic regression decision tree and SVM Thedeveloped system achieved 92 accuracy with SVM

Our study gives an important contribution to the researcharea of skin lesion classification for several reasons Firstly itfocuses on specifying the exact type of a skin lesion and notonly on categorization of skin lesions as benign or malignantSecondly this work combines the results of research doneso far related to all the steps needed for the developmentof an automatic diagnostic system for melanocytic lesiondetection and classification Finally the refinement of currentapproaches and development of new techniques andmethodswill help to improve the ability to diagnose skin moles moreprecisely and to achieve the goal of the significant reductionin melanoma mortality rate

41 Future Work Starting from the present frameworkfurther research efforts will be firstly addressed to compareand integrate the very promising approaches and corre-sponding feature descriptors reported in the most recentliterature in order to improve the classification accuracy ofmelanocytic lesions We will also conduct a follow-up studyby collecting more real data especially melanoma cases tofurther evaluate our approach One of the major problems inthe field of melanocytic skin tumors is the underdiagnosisof melanoma as a benign melanocytic or nonmelanocyticlesion To decrease the amount of dermoscopic pitfalls thenumber of melanocytic lesion types will be extended for abetter evaluation of melanocytic lesion

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This scientific work was partly supported by the National Sci-ence Centre based on the Decision no 201101NST706783

References

[1] Skin September 2015 httpsciencenationalgeographiccomsciencehealth-and-human-bodyhuman-bodyskin-article

[2] G Argenziano and H P Soyer Interactive Atlas of DermoscopyBook and CDWeb Resource Edra Medical Publishing and NewMedia Milan Itlay 2000

[3] K Korotkov and R Garcia ldquoComputerized analysis of pig-mented skin lesions a reviewrdquoArtificial Intelligence inMedicinevol 56 no 2 pp 69ndash90 2012

[4] P Soyer G Argenziano R Hofmann-Wellenhof and RH JohrColor Atlas of Melanocytic Lesions of the Skin Springer NewYork NY USA 2007

[5] Melanoma Foundation of New Zealand About melanomakey information 2014 httpwwwmelanomanetworkconzAbout-MelanomaKey-Information

[6] E Stocketh T Rosen and S Shumack Managing Skin CancerSpringer 2010

[7] S W Menzies L Bischof H Talbot et al ldquoThe performanceof SolarScan an automated dermoscopy image analysis instru-ment for the diagnosis of primary melanomardquo Archives ofDermatology vol 141 no 11 pp 1388ndash1396 2005

[8] M Burroni R CoronaGDellrsquoEva et al ldquoMelanoma computer-aided diagnosis reliability and feasibility studyrdquoClinical CancerResearch vol 10 no 6 pp 1881ndash1886 2004

[9] A Masood and A A Al-Jumaily ldquoComputer aided diagnosticsupport system for skin cancer a review of techniques andalgorithmsrdquo International Journal of Biomedical Imaging vol2013 Article ID 323268 22 pages 2013

16 BioMed Research International

[10] M E Celebi W V Stoecker and R H Moss ldquoAdvances inskin cancer image analysisrdquo ComputerizedMedical Imaging andGraphics vol 35 no 2 pp 83ndash84 2011

[11] H Zhou M Chen R Gass et al ldquoFeature-preserving artifactremoval from dermoscopy imagesrdquo in Medical Imaging ImageProcessing vol 6914 of Proceedings of SPIE pp 1ndash9 San DiegoCalif USA February 2008

[12] P Wighton T K Lee and M S Atkins ldquoDermascopic hairdisocclusion using inpaintingrdquo in Medical Imaging Image Pro-cessing vol 6914 ofProceedings of SPIE pp 1ndash8 SanDiego CalifUSA February 2008

[13] Q Abbas I F Garcia M Emre Celebi and W Ahmad ldquoAfeature-preserving hair removal algorithm for dermoscopyimagesrdquo Skin Research and Technology vol 19 no 1 pp e27ndashe36 2013

[14] J Jaworek-Korjakowska ldquoNovel method for border irregularityassessment in dermoscopic color imagesrdquo Computational andMathematicalMethods inMedicine vol 2015 Article ID 49620211 pages 2015

[15] J Jaworek-Korjakowska and R Tadeusiewicz ldquoHair removalfrom dermoscopic color imagesrdquo Bio-Algorithms and Med-Systems vol 9 no 2 pp 53ndash58 2013

[16] M E Celebi H Iyatomi G Schaefer and W V StoeckerldquoLesion border detection in dermoscopy imagesrdquoComputerizedMedical Imaging and Graphics vol 33 no 2 pp 148ndash153 2009

[17] J Jaworek-Korjakowska Analiza i detekcja struktur lokalnych wczerniaku złosliwym [Autoreferat Rozprawy Doktorskiej] AGHUniversity of Science and Technology Krakow Poland 2013

[18] E Zagrouba and W Barhoumi ldquoA prelimary approach for theautomated recognition ofmalignantmelanomardquo ImageAnalysisamp Stereology vol 23 no 2 pp 121ndash135 2004

[19] D Ballardand and C Brown Computer Vision Prentice HallProfessional Technical Reference 1st edition 1982

[20] R M Haralick ldquoA measure for circularity of digital figuresrdquoIEEE Transactions on Systems Man and Cybernetics vol 4 no4 pp 394ndash396 1974

[21] R S Montero and E Bribiesca ldquoState of the art of compactnessand circularity measuresrdquo International Mathematical Forumvol 4 no 27 pp 1305ndash1335 2009

[22] J Smietanski R Tadeusiewicz and E Łuczynska ldquoTextureanalysis in perfusion images of prostate cancermdasha case studyrdquoInternational Journal of Applied Mathematics and ComputerScience vol 20 no 1 pp 149ndash156 2010

[23] J W Choi Y W Park S Y Byun and S W Youn ldquoDifferentia-tion of benign pigmented skin lesions with the aid of computerimage analysis a novel approachrdquo Annals of Dermatology vol25 no 3 pp 340ndash347 2013

[24] M E Celebi H A Kingravi B Uddin et al ldquoA methodologicalapproach to the classification of dermoscopy imagesrdquo Comput-erized Medical Imaging and Graphics vol 31 no 6 pp 362ndash3732007

[25] C Manning P Raghavan and H Schutze Introduction toInformation Retrieval Cambridge University Press 2008

[26] J Shawe-Taylor and N Cristianini Kernel Methods for PatternAnalysis Cambridge University Press Cambridge UK 2004

[27] L Kaufman and P Rousseeuw Finding Groups in Data AnIntroduction to Cluster Analysis Wiley Hoboken NJ USA1990

[28] R M Haralick K Shanmugam and I Dinstein ldquoTexturalfeatures for image classificationrdquo IEEE Transactions on SystemsMan and Cybernetics vol 3 no 6 pp 610ndash621 1973

[29] MM Galloway ldquoTexture analysis using gray level run lengthsrdquoComputer Graphics and Image Processing vol 4 no 2 pp 172ndash179 1975

[30] A Chu C M Sehgal and J F Greenleaf ldquoUse of gray valuedistribution of run lengths for texture analysisrdquo Pattern Recog-nition Letters vol 11 no 6 pp 415ndash419 1990

[31] B V Dasarathy and E B Holder ldquoImage characterizationsbased on joint gray level-run length distributionsrdquo PatternRecognition Letters vol 12 no 8 pp 497ndash502 1991

[32] H Ganster A Pinz R Rohrer E Wildling M Binder and HKittler ldquoAutomated melanoma recognitionrdquo IEEE Transactionson Medical Imaging vol 20 no 3 pp 233ndash239 2001

[33] M DrsquoAmico M Ferri and I Stanganelli ldquoQualitative asym-metry measure for melanoma detectionrdquo in Proceedings of theIEEE International Symposium on Biomedical Imaging Nano toMacro vol 2 pp 1155ndash1158 Arlington Va USA April 2004

[34] W V Stoecker W W Li and R H Moss ldquoAutomatic detectionof asymmetry in skin tumorsrdquo Computerized Medical Imagingand Graphics vol 16 no 3 pp 191ndash197 1992

[35] U Fayyad and K Irani ldquoMulti-interval discretization ofcontinuous-valued attributes for classification learningrdquo inProceedings of the 13th International Joint Conference on ArticialIntelligence vol 2 pp 1022ndash1029 Morgan Kaufmann 1993

[36] S Aksoy and R M Haralick ldquoFeature normalization andlikelihood-based similarity measures for image retrievalrdquo Pat-tern Recognition Letters vol 22 no 5 pp 563ndash582 2001

[37] V Ng and D Cheung ldquoMeasuring asymmetries of skin lesionsrdquoin Proceedings of IEEE International Conference on SystemsMan and Cybernetics vol 5 pp 4211ndash4216 IEEE PressOrlando Fla USA October 1997

[38] Z She Y Liu and A Damatoa ldquoCombination of features fromskin pattern and ABCD analysis for lesion classificationrdquo SkinResearch and Technology vol 13 no 1 pp 25ndash33 2007

[39] B Kusumoputro and A Ariyanto ldquoNeural network diagnosisof malignant skin cancers using principal component analysisas a preprocessorrdquo in Proceedings of the IEEE International JointConference on Neural Networks and the IEEE World Congresson Computational Intelligence vol 1 pp 310ndash315 IEEE PressAnchorage Alaska USA May 1998

[40] E Claridge P N Hall M Keefe and J P Allen ldquoShape analysisfor classification ofmalignantmelanomardquo Journal of BiomedicalEngineering vol 14 no 3 pp 229ndash234 1992

[41] Y Cheng R Swamisai S E Umbaugh et al ldquoSkin lesionclassification using relative color featuresrdquo Skin Research andTechnology vol 14 no 1 pp 53ndash64 2008

[42] K Cheung Image processing for skin cancer detection malignantmelanoma recognition [PhD thesis] University of Toronto 1997

[43] T K Lee D I McLean and M S Atkins ldquoIrregularity indexa new border irregularity measure for cutaneous melanocyticlesionsrdquoMedical Image Analysis vol 7 no 1 pp 47ndash64 2003

[44] T K Lee and E Claridge ldquoPredictive power of irregular bordershapes for malignant melanomasrdquo Skin Research and Technol-ogy vol 11 no 1 pp 1ndash8 2005

[45] Y Chang R J Stanley R H Moss andW Van Stoecker ldquoA sys-tematic heuristic approach for feature selection for melanomadiscrimination using clinical imagesrdquo Skin Research and Tech-nology vol 11 no 3 pp 165ndash178 2005

[46] A J Round AW G Duller and P J Fish ldquoLesion classificationusing skin patterningrdquo Skin Research and Technology vol 6 no4 pp 183ndash192 2000

BioMed Research International 17

[47] Z She and P Fish ldquoAnalysis of skin line pattern for lesionclassificationrdquo Skin Research and Technology vol 9 no 1 pp73ndash80 2003

[48] R P Walvick K Patel S V Patwardhan and A P DhawanldquoClassification of melanoma using wavelet-transform-basedoptimal feature setrdquo in Proceedings of the Medical ImagingImage Processing vol 5370 of Proceedings of SPIE pp 944ndash951May 2004

[49] W V Stoecker C-S Chiang and R H Moss ldquoTexture in skinimages comparison of three methods to determine smooth-nessrdquo Computerized Medical Imaging and Graphics vol 16 no3 pp 179ndash190 1992

[50] S V Deshabhoina S E Umbaugh W V Stoecker R H Mossand S K Srinivasan ldquoMelanoma and seborrheic keratosisdifferentiation using texture featuresrdquo Skin Research and Tech-nology vol 9 no 4 pp 348ndash356 2003

[51] G M Weiss ldquoMining with rarity a unifying frameworkrdquo ACMSIGKDD Explorations Newsletter vol 6 no 1 pp 7ndash19 2004

[52] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

[53] H Liu and L Yu ldquoToward integrating feature selection algo-rithms for classification and clusteringrdquo IEEE Transactions onKnowledge and Data Engineering vol 17 no 4 pp 491ndash5022005

[54] I T Jolliffe Principal Component Analysis Springer New YorkNY USA 1986

[55] I Kononenko and E Simec ldquoInduction of decision treesusing RELIEFFrdquo in Proceedings of the ISSEK94 Workshop onMathematical and Statistical Methods in Artificial Intelligencevol 363 of International Centre forMechanical Sciences pp 199ndash220 Springer Vienna Austria 1995

[56] R Battiti ldquoUsing mutual information for selecting features insupervised neural net learningrdquo IEEE Transactions on NeuralNetworks vol 5 no 4 pp 537ndash550 1994

[57] M A Hall ldquoCorrelation-based feature selection for discreteand numeric class machine learningrdquo in Proceedings of the 17thInternational Conference on Machine Learning (ICML rsquo00) pp359ndash366 Morgan Kaufmann 2000

[58] E E Ghiselli Theory of Psychological Measurement McGraw-Hill New York NY USA 1964

[59] I H Witten and E Frank Data Mining Practical MachineLearning Tools and Techniques Morgan Kaufmann San Fran-cisco Calif USA 2005

[60] S Dreiseitl L Ohno-Machado H Kittler S Vinterbo HBillhardt and M Binder ldquoA comparison of machine learningmethods for the diagnosis of pigmented skin lesionsrdquo Journal ofBiomedical Informatics vol 34 no 1 pp 28ndash36 2001

[61] A Glowacz A Glowacz and P Korohoda ldquoRecognition ofmonochrome thermal images of synchronous motor with theapplication of binarization and nearestmean classifierrdquoArchivesof Metallurgy and Materials vol 59 no 1 pp 31ndash34 2014

[62] R Tadeusiewicz How Intelligent Should be System for ImageAnalysisvol 339 of Studies in Computational IntelligenceSpringer Berlin Germany 2011

[63] L Jost ldquoEntropy and diversityrdquo Oikos vol 113 no 2 pp 363ndash375 2006

[64] K-B Duan and S S Keerthi ldquoWhich is the bestmulticlass SVMmethod An empirical studyrdquo inMultiple Classifier Systems 6thInternational Workshop MCS 2005 Seaside CA USA June 13ndash15 2005 Proceedings vol 3541 of Lecture Notes in ComputerScience pp 278ndash285 Springer Berlin Germany 2005

[65] C Hsu and C Lin ldquoA comparison of methods for multi-class support vector machinesrdquo IEEE Transactions on NeuralNetworks vol 13 no 2 pp 415ndash425 2002

[66] A A Safi Towards computer-aided diagnosis of pigmented skinlesionson [PhD thesis] Technische Universitat Munchen 2012

[67] C-W Hsu C-C Chang and C-J Lin ldquoA practical guide tosupport vector classificationrdquo Tech Rep National Taiwan Uni-versity 2003 httpwwwcsientuedutwsimcjlinpapersguideguidepdf

[68] A M Molinaro R Simon and R M Pfeiffer ldquoPrediction errorestimation a comparison of resamplingmethodsrdquo Bioinformat-ics vol 21 no 15 pp 3301ndash3307 2005

[69] R Bouckaert ldquoChoosing between two learning algorithmsbased on calibrated testsrdquo in Proceedings of the 20th Interna-tional Conference on Machine Learning (ICML rsquo03) T Fawcettand N Mishra Eds pp 51ndash58 AAAI Press Washington DCUSA 2003

[70] P Flach Machine Learning The Art and Science of AlgorithmsThat Make Sense of Data Cambridge University Press NewYork NY USA 2012

[71] P Rubegni G Cevenini M Burroni et al ldquoAutomated diagno-sis of pigmented skin lesionsrdquo International Journal of Cancervol 101 no 6 pp 576ndash580 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 7: Research Article Automatic Classification of Specific ...downloads.hindawi.com/journals/bmri/2016/8934242.pdfmost popular melanocytic nevi to the diagnostic categories including Clark

BioMed Research International 7

Health skin segmentation

(a) (b) (c)

(d) (e)

Figure 9 Results of the segmentation process (andashd) segmentation of the healthy skin with the region-growing algorithm (e) segmentedarea

the state of the art of segmentation methods and comparethem with the statistical region merging as a recent colorimage segmentation technique based on region growingand merging Based on the achieved results and conductedexperiments the skin lesion extraction is performed byseeded region-growing algorithm [17] in regard to twoaspects Firstly during the preprocessing step the healthyskin becomes homogeneous Secondly the whole skinmole isvisible in the dermoscopic image and surrounded by healthyskin It means that the healthy skin surrounds the moleRegion-growing techniques generally give better results innoisy images where edges are extremely difficult to detect

For the skin lesion segmentation we take one seed whichis located in the left upper corner of the image (Figure 9(a))The region is iteratively grown by comparing all unallocatedneighboring pixels to the regionThe region-growing processconsists of picking a seed from the set investigating all4-connected neighbors of this seed and merging suitableneighbors to the seed The seed is then removed and allmerged neighbors are added to the seed set The region-growing process continues until the seed set is empty

These steps have been precisely described in our work[14] In Figure 9 we present the results of the segmentationstep for several iterations of the region-growing algorithm

23 Geometrical Feature Extraction Geometrical featureshave been used mainly to describe lesionrsquos outline as itsirregularity usually indicates malignancy Those features arebased mostly on such properties of an object as areadiameters or geometric moments To ensure robustness ofthe selected features their formulation does not dependdirectly on objectrsquos perimeter (as in case of raster images itis hard to accurately estimate this quantity) All features arenormalized to ensure their scale- rotation- and translation-invariance

To assess lesionrsquos shape the following features havebeen extracted maximal diameter (maximum distancebetween two arbitrary points) equivalent diameter (119863eq =

radic(4120587) sdot Area) variance of the radial distance distributionrectangularity (ratio of area of an object to area of itsbounding box) elongation (aspect ratio of objectrsquos boundingbox) eccentricity Haralickrsquos compactness and normalizeddiscrete compactness [18ndash21]

Variance of the radial distance distribution is given by [18]

119904119889

2

=1

card (119861)sum119901isin119861

(119889 (119901 119862) minus 119889 (119901 119862))2

119889 (119901 119862)2

(2)

where 119861 is a set of all pixels constituting perimeter of object119874 119862 is geometric centroid of 119874 and 119889(119886 119887) is the distancebetween pixels 119886 and 119887 in a chosen metric As studied lesionsare of circular shape the Euclidean metric has been chosen

Eccentricity measures deviation of a conic curve from acircle and (in mathematics) is formulated as ratio of distancebetween foci of an ellipse to the length of its major axisIn image processing eccentricity of an object 119874 is actuallyvalue of eccentricity of an ellipse with same second momentsas object 119874 [19]

120576 =(11989802minus 11989820)2

+ 411989811

(11989802+ 11989820)2

(3)

where 119898119901119902

denotes an image moment of (119901 119902) order How-ever such a formulation of eccentricity still serves the pur-pose of measuring circularity of an object

Haralickrsquos compactness is a measure for circularity of adigital figure defined as 120583

119877120590119877 where 119877 is a random variable

of the distance between the center of the figure to any part ofits perimeter [20]

8 BioMed Research International

Normalized discrete compactness 119862119863119873

measure is basedon counting the number of cell sides common to adjacentpixels of objectrsquos perimeter a measure called discrete com-pactness 119862

119863[21]

119862119863119873

=119862119863minus 119862119863min

119862119863max minus 119862

119863

=119875 minus 2119899 + 2

119875 + 4radic119899

119862119863=4119899 minus 119875

2 119862119863min = 119899 minus 1 119862

119863max =4119899 minus 4radic119899

2

(4)

where119862119863min119862119863max are respectively lower and upper bound

of discrete compactness of a shape composed of 119899 pixels and119875 is the perimeter of the digital region [21]

24 Color-Based Features The analysis of lesionrsquos colors is animportant source of information when determining lesionrsquostype as malignant lesions are characterized by a rich texture[22]

The following color-based features have been extractednumber of colors present within lesionrsquos area (together withinformation about the presence of two specific colors whiteand black) concentricity centroid distance and 119871

lowast

119886lowast

119887lowast

histogram distances [23 24]To determine the number of colors and to calculate the

concentricity an image has been converted to CIE 119871lowast

119886lowast

119887lowast

color space and then 119886lowast and 119887

lowast color channels have beenclustered into four clusters Three clustering algorithms havebeen tested 119896-means clustering kernel 119896-means clusteringand hierarchical agglomerative clustering [25 26] Kernel119896-means algorithm has been tested using Gaussian kernelfor 120590 isin 1 2 [26] In case of hierarchical agglomerativeclustering Wardrsquos minimum variance method based onEuclidean distance between clusters has been applied as thecriterion for choosing the pair of clusters to merge at eachstep [27]

The number of colors has been related to the greatestdistance between clustersrsquo centroids (each pair of clusters hadbeen considered) and it has been assumed that to identify twocolors as significantly different the distance between them in119886lowast

119887lowast space must exceed certain threshold value

119899colors = max(1 lfloor1120591max (119863)rfloor)

119863 = 119889119864(1198881 1198882) 1198881 1198882isin 119862 and 119888

1= 1198882

(5)

where 119862 is a set of clustersrsquo centroids The threshold value 120591has been set to 120591 = 12

It has been assumed that to state the presence of a certaincolor within the lesion the area of biggest cohesive area ofthat color must be greater than 1 of the area of the wholelesion Pixels have been recognized as white if 119871lowast gt 65 and asblack if 119871lowast lt 15 where 119871lowast is the value of luminosity channelfor the given pixel

Based on the aforementioned clusterization a measurecalled ldquoconcentricityrdquo is derived [23] To compute concen-tricity of an object its color clusters are first arranged inascending order regarding the area of their convex hull(ie segment 119878

1has the smallest area of convex hull and

segment 1198784the greatest) and then a few auxiliary indexes are

calculated The 119899(sdot) function stands for the number of pixelsof the given object

For segment 1198781 the percent area index PA is computed

[23]

PA =119899 (119862119862max)

119899 (1198781)

(6)

where 119862119862max is the largest cohesive set of pixels of 1198781 The PAdescribes how well the core is grouped

Let Hull denote the minimal convex area which includesthe entire pixels of 119878

2 The core inclusion CI is then defined

as [23]

CI =119899 (1198781capHull)

119899 (1198781)

(7)

The CI describes how well the second smallest area encirclesthe core

Let Core denote the minimal convex area which includesthe entire pixels of 119878

1 The hull exclusion HE is given by [23]

HE = 1 minus119899 (1198782cap Core)

119899 (1198782)

(8)

TheHEdescribes how exclusively the hull surrounds the coreFinally concentricity is defined as the product of the three

variables [23]

Concentricity = PA times CI timesHE

Concentricity isin [0 1]

(9)

Concentricity closer to one implies a better concentric struc-ture

Images of all regions used to compute concentricity havebeen compiled in Figure 10

The centroid distance for a color channel is defined asthe distance between the geometric centroid (of the binaryobject) and the brightness centroid of that channel Thebrightness centroid may be considered a center of mass ofan object whose density is determined by intensity valuesof its pixels If the pigmentation in a particular channel ishomogeneous the brightness centroid will be close to thegeometric centroid resulting in small value of the centroidaldistance

Color similarity of two regions has been quantified by the1198711- and 119871

2-norm histogram distances for CIE 119871

lowast

119886lowast

119887lowast color

space coarsely quantized into 4 times 8 times 8 bins [24]

1198711(119867119860 119867119861) =

4times8times8

sum

119894=1

1003816100381610038161003816119867119860 (119894) minus 119867119861(119894)1003816100381610038161003816

1198712(119867119860 119867119861) =

4times8times8

sum

119894=1

(119867119860(119894) minus 119867

119861(119894))2

(10)

where119867119860119867119861are histograms of areas 119860 and 119861 respectively

BioMed Research International 9

Object S1 S2 CCmax

Core S1 cap core Hull S2 cap hull

Figure 10 Regions used to compute concentricity of an object

25 Texture Description Quantitative properties of lesionrsquostexture were described with measures based on a gray levelcooccurrence matrix (GLCM) and a gray level run-lengthmatrix (GLRLM) [28 29]

GLCM is a square matrix 119875 where the (119894 119895)th entry of 119875represents information about the frequency of occurrence ofsuch two adjacent pixels where one of them has intensity 119894

and another has intensity 119895 Such informationmay be used toderive second-order statistical measurements characterizingexamined texture GLCM and six measures derived from it(contrast correlation energy homogeneity maximum prob-ability and dissimilarity) have been calculated as describedby Celebi et al [24]

Contrast = sum

119894119895

(119894 minus 119895)2

119875119894119895

Correlation = minussum

119894119895

(119894 minus 120583119909) (119895 minus 120583

119910)

120590119909120590119910

119875119894119895

Energy = sum

119894

sum

119895

[119875119894119895]2

Homogeneity = sum

119894

sum

119895

119875119894119895

1 + (119894 minus 119895)2

Maximum Probability = max119894119895

119875119894119895

Dissimilarity = sum

119894

sum

119895

119875119894119895sdot1003816100381610038161003816119894 minus 119895

1003816100381610038161003816

(11)

With GLRLM it is possible to analyze higher orderstatistical features for the given texture GLRLM is a two-dimensional matrix in which each element 119901(119894 119895 | 120579) gives

the total number of occurrences of runs of length 119895 at graylevel 119894 in a given direction 120579

In our study the direction of runs is irrelevant asdermoscopic camera has no fixed orientation when takingimages of lesionsmdashthere is no ldquoreference orientationrdquo Theonly thing that matters is texturersquos homogeneity thus insteadof calculating measures for individual orientations 120579 isin

0∘

45∘

90∘

135∘

all GLRLM computed for the aforemen-tioned orientations have been added together and mea-sures have been computed only using this new orientation-invariant matrix

GLRLM is used to derive 11measures Five basicmeasures(SRE LRE GLN RLN and RP) describe the distribution ofrunsrsquo lengths (SRE LRE RLN and RP) and runsrsquo intensity(GLN) [29] As SRE and LRE do not consider pixelsrsquo intensityLGRE andHGREmeasures have been proposed [30] FinallySRLGE SRHGE LRLGE and LRHGEmeasures are based onstatistical properties of a joint probability distribution of bothrunsrsquo intensity and length [31]

SRE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

119894119895

1198952

LRE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198952

119894119895

GLN =1

119899119903

119866

sum

119894=1

(

119877

sum

119895=1

119894119895)

2

RLN =1

119899119903

119877

sum

119895=1

(

119866

sum

119894=1

119894119895)

2

10 BioMed Research International

RP =119899119903

119899

LGRE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

119894119895

1198942

HGRE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198942

119894119895

SRLGE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

119894119895

1198942 sdot 1198952

SRHGE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198942

119894119895

1198952

LRLGE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198952

119894119895

1198942

LRHGE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198942

1198952

119894119895

(12)

where 119899 is the number of image pixels with 119866 gray levels 119877 isthe maximal run length and 119899

119903is the total number of runs in

an image (119899119903= sum119866

119894=1sum119877

119895=1119894119895)

26 Asymmetry The observed asymmetry is a significantdiagnostic premise as in malignant lesions the arrangementof local structures (eg dots streaks and pigmentation nets)is nonuniform across the whole area of a lesion [32] Someresearchers point out that for instance sharp transitionsbetween central and border area indicatemalignancy [24 33]

Asymmetry measures may be defined in terms of ratiosof feature values across various lesionrsquos area segments or interms of changes of feature values between halves obtainedby splitting lesion area into halves using straight section linespassing through the center of mass [32 33]

In our study a different approach has been adopted Thelesion area has been first divided into sets of subregions inthree differentmanners (Figure 11) by splitting (1) into centraland border part (2) into halves along minor and major axisand (3) into quarters using same axes Then a variance offeature values has been calculated for each set of subregionsAdditionally asymmetry indices proposed by Stoecker et al[34] and the following quantities have been computed

119877119876119894

=119891119876119894

119891119882

(13)

where 119891119876119894

denotes value of feature 119891 calculated for 119894thquadrant and 119891

119882for the whole area respectively

27 Data Preparation In order to apply correlated-basefeature selection method (the choice of this particularmethod is discussed later) numerical features have beendiscretizedThis is due to the fact that in the studied problem

the decision feature is a categorical variable which meansit takes on discrete values and the relationship betweenvariables of those two classes cannot be directly assessedTheonly solution is to discretize the variable taking continuousvalues In general terms discretization is a simple logicalcondition which takes into account one or a few attributesand which aims at splitting a range of data into at least twosubsets Data in our study have been discretized accordingto the method proposed by Fayyad and Irani [35] whichuses heuristics minimizing entropy based on the ldquominimumdescription length principlerdquo

To ensure correct work of algorithms based on analysis ofdistances between points in the feature space (eg 119896-nearestneighbors algorithm SVM) for each feature its values havebeen scaled to the range of [minus1 1] Consequently the riskof a situation in which a feature with larger range of valuesdominates other features has been eliminated Normallydistributed features have been transformed using 119885-score(Equation (14)) whereas other features were rescaled linearlyto the desired range (15) [36] Consider

x =x minus x3119904

where x =1

119873sum

119894

x119894 119904 =

1

119873 minus 1(sum

119894

(x119894minus 120583)2

)

(14)

x = 2x119894minus xmin

xmax minus xminminus 1 (15)

where x is a vector of featurersquos values prior to normalizationand x is the same vector after normalization To determineif a given feature is well modeled by a normal distributionchi-square goodness-of-fit test has been applied

28 Class Imbalance Problem The data set used in thisstudy exhibits class imbalance problem which means thatthe classes are not approximately equally represented (ie itconsists of more than three times as many images of Clarknevus than images of blue nevus) In such case most clas-sifiers will focus on learning how to identify representativesof majority classes leading to poor predictive accuracy forminority classes When making a diagnosis such a situationis unacceptable as misclassification error may lead even topatientrsquos death

The most common way to address this issue is sampling[51] There exist two main types of sampling methodsundersampling which consists in removing representativesof the majority class and oversampling which consists inadding examples of the minority class Out of many samplingtechniques two methods are particularly popular randomundersampling and synthetic minority oversampling technique(SMOTE) In random undersampling method randomlydrawn representatives of the majority class are removed fromthe data set In SMOTE method new ldquosyntheticrdquo examplesare created by averaging a few representatives of the sameminority classes which are nearest neighbors in the featurespace [52]

BioMed Research International 11

(a) Whole area

(b) Inner and outer area

(c) Split along the major axis (d) Split along the minor axis

(e) Split along both major and minor axis simultaneously

Figure 11 A sample division of an object into subregions

In this study SMOTE method has been used as it hasbeen proven to be more effective than random undersam-pling when tested on a set of dermoscopic images obtainedfrom various sources without any common or rare lesiontypes omitted [24]

29 Feature Selection Feature selection consists in reducingdata dimensionality by rejecting redundant unimportant ornoisy features thus resulting in increased prediction accu-racy less complex classifier models and better computationefficiency

Feature selection algorithms may be divided into threemain categories filters wrappers and projections [53 54] Inthis study filter methods have been used for two reasons[24] Firstly as filters are usually fast it is possible to test

a huge number of combinations of features Secondly if onewould like to use wrappers on a given data set the targetlearning algorithm should demonstrate satisfactory resultsfor the original data set as wrappers are based on feedbackprinciple As some features extracted in this study might beirrelevant or redundant as well as due to class imbalancewrappers have not been likely to fulfill those restrictionsOn the other hand projections are used mainly to increasecomputational performance which was not the case in thisstudy

Out of numerous available filters three areworth noticingdue to their satisfactory performance on various data sets[53] ReliefFmutual information-based feature selection andcorrelation-based feature selection [55ndash57] As numerous fea-tures extracted in this study are strongly mutually correlated

12 BioMed Research International

correlation-based feature selection has been chosen as theonly method which takes into account not only relationshipsbetween features and the decision class but also relationshipsbetween features themselves

The heuristic ldquomeritrdquo Merit119878of a feature subset 119878 contain-

ing 119896 features is given by [58]

Merit119878=

119896119903cf

radic119896 + 119896 (119896 minus 1) 119903ff (16)

where 119903cf is the average feature-class correlation and 119903ff theaverage feature-feature intercorrelation

To compare two feature vectors of discrete values thesymmetric uncertainty an improved version of informationgain measure has been used [59] Symmetric uncertainty isgiven by

SU (119883 119884) = 2 [119867 (119883) + 119867 (119884) minus 119867 (119883 119884)

119867 (119884) + 119867 (119883)] (17)

where 119867(119860) is the marginal entropy of set 119860 and 119867(119860 119861) isthe joint entropy of sets 119860 and 119861

The number of selected features is a parameter whichrequires a careful tuning too few features results may pre-vent classifiers from distinguishing between various classeswhereas too many features impose risk of overfittingmdashasituation when a model excels in classifying training databut fails to generalize knowledge and hence misclassifies newsamples As some researchers suggest limiting the number offeatures 119896 to the range of 5 le 119896 le 30 [24] in this study 119896 = 20

has been adoptedThe applied procedure allowed choosing features which

best discriminate lesion types concentricity computed using119896-means algorithm concentricity computed using kernel 119896-means algorithm for 120590 = 1 119871lowast119886lowast119887lowast histogram distancesin 1198711and 119871

2metrics computed for central and border

part variances from sets of 119871lowast

119886lowast

119887lowast histogram distances

between the whole region and individual quarters in both1198711and 119871

2metrics variances from sets of 119871lowast119886lowast119887lowast histogram

distances between each pair of quarters in both 1198711and 119871

2

metrics variances from sets of GLCM features (dissimilar-ity maximum probability entropy energy correlation andcontrast) computed for quarters variance of variances of119886lowast channel values (in 119871

lowast

119886lowast

119887lowast color space) calculated for

quarters variance of variances of H channel values (in HSVcolor space) calculated for quarters mean value of variancesof 119886lowast channel values (in 119871

lowast

119886lowast

119887lowast color space) for quarters

variance from sets of 119871lowast119886lowast119887lowast histogram distances betweenhalves in both 119871

1and 119871

2metrics and variance from sets of

GLCM dissimilarity computed for halvesThis particular choice of features does not diverge from

the clinical practice according to which asymmetry is one ofthe most important premises in determining the lesion type

210 Classification

2101 Models The following predictive models have beentested [60 61] 119896-nearest neighbors algorithm (for 119896 isin

3 5 10 15) logistic regression decision tree and supportvector machine (SVM) Other models like neural networks

or rule-based systems have not been a subject of this study[62] In case of the decision tree Gini-Simpson index hasbeen used as a measure of data diversity [63] Decisionboundary of SVMs has been determined using radial basisfunction as kernels Radial basis function has been preferredover linear sigmoid and polynomial kernels as it exhibits afew important properties [24] it allows classifying nonlin-early separable data sets (as is the case in this study) it ischaracterized by high numerical stability and it is definedusing only one parameter Moreover as in this study there areas many as four decision classes the multiclass classificationproblem has been decomposed into a (greater) number ofbinary classification problems Winner-takes-all strategy hasbeen applied to combine solutions of subproblems into afinal solution as for small data sets its results are similarto the results of the best strategymdashpairwise couplingmdashwhileat the same time its computational burden is much lower[64 65]

Individual classifiers have been trained using typicalparameters It has been assumed that the cost of misclassify-ing melanoma as nevi is four times higher than another sortof misclassification Initial experimental results have provenSVM to be the most effective classifier (similar observationshave been made by other research teams [60 66]) Con-sequently further experiments were focused on improvingperformance of SVMs by fine-tuning their parameters

SVMwith radial basis function kernel is governed by twoparametersmdashmisclassification cost 119862 and kernel width 120574The task is to select those parameters in such a way that theaccuracy of predictions for new data (data not used in thelearning process) is maximal

As values of only two parameters have had to be adjustedgrid search has been applied [67] For each parametervalues from exponential series have been considered 119862 isin

2minus5

2minus3

215

and 120574 isin 2minus15

2minus13

23

Each com-bination of parameter values (119862

0 1205740) has been assessed

using a validation procedure described below After the gridsearch had finished SVM have been trained using optimalparameter values (119862lowast 120574lowast)

2102 Validation The aim of the validation step is to assessclassifierrsquos quality based on the number of generalizationerrors that is misclassified samples Optimal SVM modelshave been validated using stratified Monte Carlo cross vali-dation method and in each iteration the test set consisted of10 of samples

The Monte Carlo variant of cross validation had beenchosen for a number of reasons Firstly experimental resultsobtained for a similar problem [24] suggest high effectivenessof this method Secondly in their analysis Molinaro et al[68] point out that for datasets consisting of few sampleswith many features such as a dataset used in this studythe difference in effectiveness between Monte Carlo andldquoclassicrdquo 10-fold cross validation is insignificant The numberof samples drawn into the test set has been determined usinga ldquorule of thumbrdquo [69] Finally by applying Monte Carlocross validation one may avoid constructing overdevel-oped predictive models which decreases the risk of overfit-ting

BioMed Research International 13

As in the data set used in this study there is a considerabledifference in the number of representatives between variousdecision classes and stratification has been applied Stratifica-tion ensures the similar distribution of representatives of eachdecision class in both training set and test set in each iterationof the validation procedure

3 Results and Discussion

31 Database Specification The described algorithm for theautomatic classification of specific melanocytic lesions hasbeen tested on dermoscopic images from a widely usedInteractive Atlas of Dermoscopy [2] Images for this atlashave been provided by two university hospitals (Universityof Naples Italy and University of Graz Austria) and storedon a CD-ROM in the JPEG format The documentation ofeach dermoscopic image was performed using a Dermaphotapparatus (Heine Optotechnik Herrsching Germany) anda photo camera (Nikon F3) mounted on a stereomicroscope(Wild M650 Heerbrugg AG Switzerland) in order to pro-duce digitized ELM images of skin lesions All the imageshave been assessed manually by a dermoscopic expert withan extensive clinical experience

Furthermore all the descriptions of skin cases were basedon the histopathological examination of the biopsy materialIn order to develop and test the automatic procedure for theclassification of melanocytic skin lesions 300 images withdifferent resolutions ranging from 0033 to 05mmpixelwere chosen The database included 100 Clark nevus cases70 blue nevus cases 70 Spitz nevus cases and 60 malignantmelanomas

The preprocessing step (black frame removal and hairremoval) as well as the segmentation step (border error lessthan 6) did not affect the further research [14]

32 Statistical Analysis The performance of a classifier canbe assessed based on the analysis of discrepancies in classifi-cation that is differences between the classification carriedout by the classifier and the actual classification (groundtruth) summarized in a confusion matrix In this matrixeach row refers to actual classes 119888(119909) as recorded in the testset and each column refers to classes as predicted by theclassifier (119909) The (119894 119895)th element contains the number oftest instances predicted by a classifier to belong to 119895th classclass whereas they are actually representatives of 119894th class

From a contingency table we can calculate three perfor-mance indicators accuracy true positive rate (TPR) and truenegative rate (TNR)

For a binary classification problem those indicators aregiven by [70]

Accuracy = 1

Tesum

119909isinTe119868 [ (119909) = 119888 (119909)]

TPR =sum119909isinTe 119868 [ (119909) = 119888 (119909) = oplus]

sum119909isinTe 119868 [119888 (119909) = oplus]

TNR =sum119909isinTe 119868 [ (119909) = 119888 (119909) = ⊖]

sum119909isinTe 119868 [119888 (119909) = ⊖]

(18)

where Te is the test set and the function 119868[sdot] denotes theindicator function Positive and negative classes are denotedby oplus and ⊖ respectively

Although the initial problem is a multiclass classificationproblem (with four classes) it can be decomposed into fourbinary classification problems whether an instance belongsto the given class or not

When dealing with a data set exhibiting class imbalanceproblem the accuracy should be considered only a prelim-inary performance indicator For instance for a data setwith class ratio 99 1 a classifier which simply counts eachinstance as a representative of the majority class would scorethe accuracy of 99 Therefore in such cases better measurewould be a plot of the ROC curve and the area under thatcurve

The ROC plot completely visualizes the (normalized)contingency table by means of plot in the unit squarewith TPR rate on the 119910-axis and TNR rate on the 119909-axisEach of the points marked on the ROC plot specifies theclassification performance in terms of true and false positiverates achieved by the corresponding score thresholds In ourstudy posterior probabilities have been used as scores Thosepoints are connected by straight lines producing a piecewiselinear curve the ROC curve that rises monotonically from(0 0) to (1 1) The ROC plot can be interpreted as a plotof costs-to-benefits ratio To measure the performance of aclassifier using a ROC plot the area under that plot curve(AUC) is calculated

Values of aforementioned performance indicators forall examined classifiers have been summarized in Table 2Additionally Figure 12 presents ROC plots for a malignant-or-benign classification

Among all examined classifiers SVM achieved definitelybest results It achieved highest overall accuracy ACCALL =

09296 whereas for the second-best classifier the logisticregression ACCALL = 08028 The SVM outperformed thelogistic regression in classifying all four lesion types It isto be stressed that the SVM turned out to be the mosteffective classifier in recognizing malignant melanoma themost dangerous of all four lesions Its true positive rate andarea under the curve amounted to TPRMM = 08611 andAUCMM = 09688 respectively A little bit lower true negativerate for the SVM than for the decision tree is not a problemin the studied area It indicates misclassification of Clarkor Spitz nevi as malignant melanoma and the cost of sucha misdiagnosis is much much lower than of classifying amalignant tumor as a benign one

The achieved results are slightly better thanmost results ofmalignant-or-benign studies conducted so far Our methodallowed classifying malignant lesions with TPRMM = 08611

and TNRMM = 09623 whereas for other similar studies TPRfalls within the range 08ndash10 andTNRndash05ndash095 respectively[66 71]

A truly reliable diagnosis can be made only by ahistopathological examination of a lesion When in theirstudies Curley et al asked three experienced dermatologiststo classify lesions based only on the visual examination thosephysicians identified correctly only half of cases Therefore

14 BioMed Research International

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(a) Logit (AUC = 09469)

0

02

04

06

08

1

TPR

02 04 06 08 101 minus TNR

(b) Tree (AUC = 09555)

0

02

04

06

08

1

TPR

104 06 080201 minus TNR

(c) 3NN (AUC = 09152)

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(d) 5NN (AUC = 08746)

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(e) 10NN (AUC = 07860)

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(f) 15NN (AUC = 07378)

02 04 06 080 11 minus TNR

0

02

04

06

08

1

TPR

(g) SVM (AUC = 09688)

Figure 12 ROC plots for individual classifiers for a malignant-or-benign classification

BioMed Research International 15

Table 2 The assessment of classifiers for a 1-vs-all and overall classification

Measure Logit Tree 3NN 5NN 10NN 15NN SVMTPRBN 09730 08485 07200 07660 06939 06735 10000TNRBN 10000 09266 10000 10000 09785 09677 10000AUCBN 10000 09675 09914 09889 09679 09659 10000TPRCN 08611 06977 08519 07586 08148 08182 09143TNRCN 09528 09394 08870 08761 08783 08500 09626AUCCN 09817 09279 09615 09089 09099 09030 09851TPRMM 08387 09655 07419 06786 06071 05185 08611TNRMM 09189 09381 08919 08596 08421 08174 09623AUCMM 09469 09555 09152 08746 07860 07378 09688TPRSN 08421 07568 08235 07105 06053 05227 09429TNRSN 09712 09333 09352 09231 08846 08776 09813AUCSN 09648 09425 09606 09395 08812 08545 09789ACCALL 08803 08028 07746 07324 06761 06197 09296BN blue nevus CN Clark nevus MM malignant melanoma SN Spitz nevus ALL multiclass classification

the presented method of classification yields better resultsthan those obtained during a medical examination

4 Conclusions

This paper presents a computer-aided approach for theautomatic classification of melanocytic lesions and accuratediagnosis of melanoma In our research we evaluated ourapproach with four types of classifiers 119896-nearest neighborsalgorithm logistic regression decision tree and SVM Thedeveloped system achieved 92 accuracy with SVM

Our study gives an important contribution to the researcharea of skin lesion classification for several reasons Firstly itfocuses on specifying the exact type of a skin lesion and notonly on categorization of skin lesions as benign or malignantSecondly this work combines the results of research doneso far related to all the steps needed for the developmentof an automatic diagnostic system for melanocytic lesiondetection and classification Finally the refinement of currentapproaches and development of new techniques andmethodswill help to improve the ability to diagnose skin moles moreprecisely and to achieve the goal of the significant reductionin melanoma mortality rate

41 Future Work Starting from the present frameworkfurther research efforts will be firstly addressed to compareand integrate the very promising approaches and corre-sponding feature descriptors reported in the most recentliterature in order to improve the classification accuracy ofmelanocytic lesions We will also conduct a follow-up studyby collecting more real data especially melanoma cases tofurther evaluate our approach One of the major problems inthe field of melanocytic skin tumors is the underdiagnosisof melanoma as a benign melanocytic or nonmelanocyticlesion To decrease the amount of dermoscopic pitfalls thenumber of melanocytic lesion types will be extended for abetter evaluation of melanocytic lesion

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This scientific work was partly supported by the National Sci-ence Centre based on the Decision no 201101NST706783

References

[1] Skin September 2015 httpsciencenationalgeographiccomsciencehealth-and-human-bodyhuman-bodyskin-article

[2] G Argenziano and H P Soyer Interactive Atlas of DermoscopyBook and CDWeb Resource Edra Medical Publishing and NewMedia Milan Itlay 2000

[3] K Korotkov and R Garcia ldquoComputerized analysis of pig-mented skin lesions a reviewrdquoArtificial Intelligence inMedicinevol 56 no 2 pp 69ndash90 2012

[4] P Soyer G Argenziano R Hofmann-Wellenhof and RH JohrColor Atlas of Melanocytic Lesions of the Skin Springer NewYork NY USA 2007

[5] Melanoma Foundation of New Zealand About melanomakey information 2014 httpwwwmelanomanetworkconzAbout-MelanomaKey-Information

[6] E Stocketh T Rosen and S Shumack Managing Skin CancerSpringer 2010

[7] S W Menzies L Bischof H Talbot et al ldquoThe performanceof SolarScan an automated dermoscopy image analysis instru-ment for the diagnosis of primary melanomardquo Archives ofDermatology vol 141 no 11 pp 1388ndash1396 2005

[8] M Burroni R CoronaGDellrsquoEva et al ldquoMelanoma computer-aided diagnosis reliability and feasibility studyrdquoClinical CancerResearch vol 10 no 6 pp 1881ndash1886 2004

[9] A Masood and A A Al-Jumaily ldquoComputer aided diagnosticsupport system for skin cancer a review of techniques andalgorithmsrdquo International Journal of Biomedical Imaging vol2013 Article ID 323268 22 pages 2013

16 BioMed Research International

[10] M E Celebi W V Stoecker and R H Moss ldquoAdvances inskin cancer image analysisrdquo ComputerizedMedical Imaging andGraphics vol 35 no 2 pp 83ndash84 2011

[11] H Zhou M Chen R Gass et al ldquoFeature-preserving artifactremoval from dermoscopy imagesrdquo in Medical Imaging ImageProcessing vol 6914 of Proceedings of SPIE pp 1ndash9 San DiegoCalif USA February 2008

[12] P Wighton T K Lee and M S Atkins ldquoDermascopic hairdisocclusion using inpaintingrdquo in Medical Imaging Image Pro-cessing vol 6914 ofProceedings of SPIE pp 1ndash8 SanDiego CalifUSA February 2008

[13] Q Abbas I F Garcia M Emre Celebi and W Ahmad ldquoAfeature-preserving hair removal algorithm for dermoscopyimagesrdquo Skin Research and Technology vol 19 no 1 pp e27ndashe36 2013

[14] J Jaworek-Korjakowska ldquoNovel method for border irregularityassessment in dermoscopic color imagesrdquo Computational andMathematicalMethods inMedicine vol 2015 Article ID 49620211 pages 2015

[15] J Jaworek-Korjakowska and R Tadeusiewicz ldquoHair removalfrom dermoscopic color imagesrdquo Bio-Algorithms and Med-Systems vol 9 no 2 pp 53ndash58 2013

[16] M E Celebi H Iyatomi G Schaefer and W V StoeckerldquoLesion border detection in dermoscopy imagesrdquoComputerizedMedical Imaging and Graphics vol 33 no 2 pp 148ndash153 2009

[17] J Jaworek-Korjakowska Analiza i detekcja struktur lokalnych wczerniaku złosliwym [Autoreferat Rozprawy Doktorskiej] AGHUniversity of Science and Technology Krakow Poland 2013

[18] E Zagrouba and W Barhoumi ldquoA prelimary approach for theautomated recognition ofmalignantmelanomardquo ImageAnalysisamp Stereology vol 23 no 2 pp 121ndash135 2004

[19] D Ballardand and C Brown Computer Vision Prentice HallProfessional Technical Reference 1st edition 1982

[20] R M Haralick ldquoA measure for circularity of digital figuresrdquoIEEE Transactions on Systems Man and Cybernetics vol 4 no4 pp 394ndash396 1974

[21] R S Montero and E Bribiesca ldquoState of the art of compactnessand circularity measuresrdquo International Mathematical Forumvol 4 no 27 pp 1305ndash1335 2009

[22] J Smietanski R Tadeusiewicz and E Łuczynska ldquoTextureanalysis in perfusion images of prostate cancermdasha case studyrdquoInternational Journal of Applied Mathematics and ComputerScience vol 20 no 1 pp 149ndash156 2010

[23] J W Choi Y W Park S Y Byun and S W Youn ldquoDifferentia-tion of benign pigmented skin lesions with the aid of computerimage analysis a novel approachrdquo Annals of Dermatology vol25 no 3 pp 340ndash347 2013

[24] M E Celebi H A Kingravi B Uddin et al ldquoA methodologicalapproach to the classification of dermoscopy imagesrdquo Comput-erized Medical Imaging and Graphics vol 31 no 6 pp 362ndash3732007

[25] C Manning P Raghavan and H Schutze Introduction toInformation Retrieval Cambridge University Press 2008

[26] J Shawe-Taylor and N Cristianini Kernel Methods for PatternAnalysis Cambridge University Press Cambridge UK 2004

[27] L Kaufman and P Rousseeuw Finding Groups in Data AnIntroduction to Cluster Analysis Wiley Hoboken NJ USA1990

[28] R M Haralick K Shanmugam and I Dinstein ldquoTexturalfeatures for image classificationrdquo IEEE Transactions on SystemsMan and Cybernetics vol 3 no 6 pp 610ndash621 1973

[29] MM Galloway ldquoTexture analysis using gray level run lengthsrdquoComputer Graphics and Image Processing vol 4 no 2 pp 172ndash179 1975

[30] A Chu C M Sehgal and J F Greenleaf ldquoUse of gray valuedistribution of run lengths for texture analysisrdquo Pattern Recog-nition Letters vol 11 no 6 pp 415ndash419 1990

[31] B V Dasarathy and E B Holder ldquoImage characterizationsbased on joint gray level-run length distributionsrdquo PatternRecognition Letters vol 12 no 8 pp 497ndash502 1991

[32] H Ganster A Pinz R Rohrer E Wildling M Binder and HKittler ldquoAutomated melanoma recognitionrdquo IEEE Transactionson Medical Imaging vol 20 no 3 pp 233ndash239 2001

[33] M DrsquoAmico M Ferri and I Stanganelli ldquoQualitative asym-metry measure for melanoma detectionrdquo in Proceedings of theIEEE International Symposium on Biomedical Imaging Nano toMacro vol 2 pp 1155ndash1158 Arlington Va USA April 2004

[34] W V Stoecker W W Li and R H Moss ldquoAutomatic detectionof asymmetry in skin tumorsrdquo Computerized Medical Imagingand Graphics vol 16 no 3 pp 191ndash197 1992

[35] U Fayyad and K Irani ldquoMulti-interval discretization ofcontinuous-valued attributes for classification learningrdquo inProceedings of the 13th International Joint Conference on ArticialIntelligence vol 2 pp 1022ndash1029 Morgan Kaufmann 1993

[36] S Aksoy and R M Haralick ldquoFeature normalization andlikelihood-based similarity measures for image retrievalrdquo Pat-tern Recognition Letters vol 22 no 5 pp 563ndash582 2001

[37] V Ng and D Cheung ldquoMeasuring asymmetries of skin lesionsrdquoin Proceedings of IEEE International Conference on SystemsMan and Cybernetics vol 5 pp 4211ndash4216 IEEE PressOrlando Fla USA October 1997

[38] Z She Y Liu and A Damatoa ldquoCombination of features fromskin pattern and ABCD analysis for lesion classificationrdquo SkinResearch and Technology vol 13 no 1 pp 25ndash33 2007

[39] B Kusumoputro and A Ariyanto ldquoNeural network diagnosisof malignant skin cancers using principal component analysisas a preprocessorrdquo in Proceedings of the IEEE International JointConference on Neural Networks and the IEEE World Congresson Computational Intelligence vol 1 pp 310ndash315 IEEE PressAnchorage Alaska USA May 1998

[40] E Claridge P N Hall M Keefe and J P Allen ldquoShape analysisfor classification ofmalignantmelanomardquo Journal of BiomedicalEngineering vol 14 no 3 pp 229ndash234 1992

[41] Y Cheng R Swamisai S E Umbaugh et al ldquoSkin lesionclassification using relative color featuresrdquo Skin Research andTechnology vol 14 no 1 pp 53ndash64 2008

[42] K Cheung Image processing for skin cancer detection malignantmelanoma recognition [PhD thesis] University of Toronto 1997

[43] T K Lee D I McLean and M S Atkins ldquoIrregularity indexa new border irregularity measure for cutaneous melanocyticlesionsrdquoMedical Image Analysis vol 7 no 1 pp 47ndash64 2003

[44] T K Lee and E Claridge ldquoPredictive power of irregular bordershapes for malignant melanomasrdquo Skin Research and Technol-ogy vol 11 no 1 pp 1ndash8 2005

[45] Y Chang R J Stanley R H Moss andW Van Stoecker ldquoA sys-tematic heuristic approach for feature selection for melanomadiscrimination using clinical imagesrdquo Skin Research and Tech-nology vol 11 no 3 pp 165ndash178 2005

[46] A J Round AW G Duller and P J Fish ldquoLesion classificationusing skin patterningrdquo Skin Research and Technology vol 6 no4 pp 183ndash192 2000

BioMed Research International 17

[47] Z She and P Fish ldquoAnalysis of skin line pattern for lesionclassificationrdquo Skin Research and Technology vol 9 no 1 pp73ndash80 2003

[48] R P Walvick K Patel S V Patwardhan and A P DhawanldquoClassification of melanoma using wavelet-transform-basedoptimal feature setrdquo in Proceedings of the Medical ImagingImage Processing vol 5370 of Proceedings of SPIE pp 944ndash951May 2004

[49] W V Stoecker C-S Chiang and R H Moss ldquoTexture in skinimages comparison of three methods to determine smooth-nessrdquo Computerized Medical Imaging and Graphics vol 16 no3 pp 179ndash190 1992

[50] S V Deshabhoina S E Umbaugh W V Stoecker R H Mossand S K Srinivasan ldquoMelanoma and seborrheic keratosisdifferentiation using texture featuresrdquo Skin Research and Tech-nology vol 9 no 4 pp 348ndash356 2003

[51] G M Weiss ldquoMining with rarity a unifying frameworkrdquo ACMSIGKDD Explorations Newsletter vol 6 no 1 pp 7ndash19 2004

[52] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

[53] H Liu and L Yu ldquoToward integrating feature selection algo-rithms for classification and clusteringrdquo IEEE Transactions onKnowledge and Data Engineering vol 17 no 4 pp 491ndash5022005

[54] I T Jolliffe Principal Component Analysis Springer New YorkNY USA 1986

[55] I Kononenko and E Simec ldquoInduction of decision treesusing RELIEFFrdquo in Proceedings of the ISSEK94 Workshop onMathematical and Statistical Methods in Artificial Intelligencevol 363 of International Centre forMechanical Sciences pp 199ndash220 Springer Vienna Austria 1995

[56] R Battiti ldquoUsing mutual information for selecting features insupervised neural net learningrdquo IEEE Transactions on NeuralNetworks vol 5 no 4 pp 537ndash550 1994

[57] M A Hall ldquoCorrelation-based feature selection for discreteand numeric class machine learningrdquo in Proceedings of the 17thInternational Conference on Machine Learning (ICML rsquo00) pp359ndash366 Morgan Kaufmann 2000

[58] E E Ghiselli Theory of Psychological Measurement McGraw-Hill New York NY USA 1964

[59] I H Witten and E Frank Data Mining Practical MachineLearning Tools and Techniques Morgan Kaufmann San Fran-cisco Calif USA 2005

[60] S Dreiseitl L Ohno-Machado H Kittler S Vinterbo HBillhardt and M Binder ldquoA comparison of machine learningmethods for the diagnosis of pigmented skin lesionsrdquo Journal ofBiomedical Informatics vol 34 no 1 pp 28ndash36 2001

[61] A Glowacz A Glowacz and P Korohoda ldquoRecognition ofmonochrome thermal images of synchronous motor with theapplication of binarization and nearestmean classifierrdquoArchivesof Metallurgy and Materials vol 59 no 1 pp 31ndash34 2014

[62] R Tadeusiewicz How Intelligent Should be System for ImageAnalysisvol 339 of Studies in Computational IntelligenceSpringer Berlin Germany 2011

[63] L Jost ldquoEntropy and diversityrdquo Oikos vol 113 no 2 pp 363ndash375 2006

[64] K-B Duan and S S Keerthi ldquoWhich is the bestmulticlass SVMmethod An empirical studyrdquo inMultiple Classifier Systems 6thInternational Workshop MCS 2005 Seaside CA USA June 13ndash15 2005 Proceedings vol 3541 of Lecture Notes in ComputerScience pp 278ndash285 Springer Berlin Germany 2005

[65] C Hsu and C Lin ldquoA comparison of methods for multi-class support vector machinesrdquo IEEE Transactions on NeuralNetworks vol 13 no 2 pp 415ndash425 2002

[66] A A Safi Towards computer-aided diagnosis of pigmented skinlesionson [PhD thesis] Technische Universitat Munchen 2012

[67] C-W Hsu C-C Chang and C-J Lin ldquoA practical guide tosupport vector classificationrdquo Tech Rep National Taiwan Uni-versity 2003 httpwwwcsientuedutwsimcjlinpapersguideguidepdf

[68] A M Molinaro R Simon and R M Pfeiffer ldquoPrediction errorestimation a comparison of resamplingmethodsrdquo Bioinformat-ics vol 21 no 15 pp 3301ndash3307 2005

[69] R Bouckaert ldquoChoosing between two learning algorithmsbased on calibrated testsrdquo in Proceedings of the 20th Interna-tional Conference on Machine Learning (ICML rsquo03) T Fawcettand N Mishra Eds pp 51ndash58 AAAI Press Washington DCUSA 2003

[70] P Flach Machine Learning The Art and Science of AlgorithmsThat Make Sense of Data Cambridge University Press NewYork NY USA 2012

[71] P Rubegni G Cevenini M Burroni et al ldquoAutomated diagno-sis of pigmented skin lesionsrdquo International Journal of Cancervol 101 no 6 pp 576ndash580 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 8: Research Article Automatic Classification of Specific ...downloads.hindawi.com/journals/bmri/2016/8934242.pdfmost popular melanocytic nevi to the diagnostic categories including Clark

8 BioMed Research International

Normalized discrete compactness 119862119863119873

measure is basedon counting the number of cell sides common to adjacentpixels of objectrsquos perimeter a measure called discrete com-pactness 119862

119863[21]

119862119863119873

=119862119863minus 119862119863min

119862119863max minus 119862

119863

=119875 minus 2119899 + 2

119875 + 4radic119899

119862119863=4119899 minus 119875

2 119862119863min = 119899 minus 1 119862

119863max =4119899 minus 4radic119899

2

(4)

where119862119863min119862119863max are respectively lower and upper bound

of discrete compactness of a shape composed of 119899 pixels and119875 is the perimeter of the digital region [21]

24 Color-Based Features The analysis of lesionrsquos colors is animportant source of information when determining lesionrsquostype as malignant lesions are characterized by a rich texture[22]

The following color-based features have been extractednumber of colors present within lesionrsquos area (together withinformation about the presence of two specific colors whiteand black) concentricity centroid distance and 119871

lowast

119886lowast

119887lowast

histogram distances [23 24]To determine the number of colors and to calculate the

concentricity an image has been converted to CIE 119871lowast

119886lowast

119887lowast

color space and then 119886lowast and 119887

lowast color channels have beenclustered into four clusters Three clustering algorithms havebeen tested 119896-means clustering kernel 119896-means clusteringand hierarchical agglomerative clustering [25 26] Kernel119896-means algorithm has been tested using Gaussian kernelfor 120590 isin 1 2 [26] In case of hierarchical agglomerativeclustering Wardrsquos minimum variance method based onEuclidean distance between clusters has been applied as thecriterion for choosing the pair of clusters to merge at eachstep [27]

The number of colors has been related to the greatestdistance between clustersrsquo centroids (each pair of clusters hadbeen considered) and it has been assumed that to identify twocolors as significantly different the distance between them in119886lowast

119887lowast space must exceed certain threshold value

119899colors = max(1 lfloor1120591max (119863)rfloor)

119863 = 119889119864(1198881 1198882) 1198881 1198882isin 119862 and 119888

1= 1198882

(5)

where 119862 is a set of clustersrsquo centroids The threshold value 120591has been set to 120591 = 12

It has been assumed that to state the presence of a certaincolor within the lesion the area of biggest cohesive area ofthat color must be greater than 1 of the area of the wholelesion Pixels have been recognized as white if 119871lowast gt 65 and asblack if 119871lowast lt 15 where 119871lowast is the value of luminosity channelfor the given pixel

Based on the aforementioned clusterization a measurecalled ldquoconcentricityrdquo is derived [23] To compute concen-tricity of an object its color clusters are first arranged inascending order regarding the area of their convex hull(ie segment 119878

1has the smallest area of convex hull and

segment 1198784the greatest) and then a few auxiliary indexes are

calculated The 119899(sdot) function stands for the number of pixelsof the given object

For segment 1198781 the percent area index PA is computed

[23]

PA =119899 (119862119862max)

119899 (1198781)

(6)

where 119862119862max is the largest cohesive set of pixels of 1198781 The PAdescribes how well the core is grouped

Let Hull denote the minimal convex area which includesthe entire pixels of 119878

2 The core inclusion CI is then defined

as [23]

CI =119899 (1198781capHull)

119899 (1198781)

(7)

The CI describes how well the second smallest area encirclesthe core

Let Core denote the minimal convex area which includesthe entire pixels of 119878

1 The hull exclusion HE is given by [23]

HE = 1 minus119899 (1198782cap Core)

119899 (1198782)

(8)

TheHEdescribes how exclusively the hull surrounds the coreFinally concentricity is defined as the product of the three

variables [23]

Concentricity = PA times CI timesHE

Concentricity isin [0 1]

(9)

Concentricity closer to one implies a better concentric struc-ture

Images of all regions used to compute concentricity havebeen compiled in Figure 10

The centroid distance for a color channel is defined asthe distance between the geometric centroid (of the binaryobject) and the brightness centroid of that channel Thebrightness centroid may be considered a center of mass ofan object whose density is determined by intensity valuesof its pixels If the pigmentation in a particular channel ishomogeneous the brightness centroid will be close to thegeometric centroid resulting in small value of the centroidaldistance

Color similarity of two regions has been quantified by the1198711- and 119871

2-norm histogram distances for CIE 119871

lowast

119886lowast

119887lowast color

space coarsely quantized into 4 times 8 times 8 bins [24]

1198711(119867119860 119867119861) =

4times8times8

sum

119894=1

1003816100381610038161003816119867119860 (119894) minus 119867119861(119894)1003816100381610038161003816

1198712(119867119860 119867119861) =

4times8times8

sum

119894=1

(119867119860(119894) minus 119867

119861(119894))2

(10)

where119867119860119867119861are histograms of areas 119860 and 119861 respectively

BioMed Research International 9

Object S1 S2 CCmax

Core S1 cap core Hull S2 cap hull

Figure 10 Regions used to compute concentricity of an object

25 Texture Description Quantitative properties of lesionrsquostexture were described with measures based on a gray levelcooccurrence matrix (GLCM) and a gray level run-lengthmatrix (GLRLM) [28 29]

GLCM is a square matrix 119875 where the (119894 119895)th entry of 119875represents information about the frequency of occurrence ofsuch two adjacent pixels where one of them has intensity 119894

and another has intensity 119895 Such informationmay be used toderive second-order statistical measurements characterizingexamined texture GLCM and six measures derived from it(contrast correlation energy homogeneity maximum prob-ability and dissimilarity) have been calculated as describedby Celebi et al [24]

Contrast = sum

119894119895

(119894 minus 119895)2

119875119894119895

Correlation = minussum

119894119895

(119894 minus 120583119909) (119895 minus 120583

119910)

120590119909120590119910

119875119894119895

Energy = sum

119894

sum

119895

[119875119894119895]2

Homogeneity = sum

119894

sum

119895

119875119894119895

1 + (119894 minus 119895)2

Maximum Probability = max119894119895

119875119894119895

Dissimilarity = sum

119894

sum

119895

119875119894119895sdot1003816100381610038161003816119894 minus 119895

1003816100381610038161003816

(11)

With GLRLM it is possible to analyze higher orderstatistical features for the given texture GLRLM is a two-dimensional matrix in which each element 119901(119894 119895 | 120579) gives

the total number of occurrences of runs of length 119895 at graylevel 119894 in a given direction 120579

In our study the direction of runs is irrelevant asdermoscopic camera has no fixed orientation when takingimages of lesionsmdashthere is no ldquoreference orientationrdquo Theonly thing that matters is texturersquos homogeneity thus insteadof calculating measures for individual orientations 120579 isin

0∘

45∘

90∘

135∘

all GLRLM computed for the aforemen-tioned orientations have been added together and mea-sures have been computed only using this new orientation-invariant matrix

GLRLM is used to derive 11measures Five basicmeasures(SRE LRE GLN RLN and RP) describe the distribution ofrunsrsquo lengths (SRE LRE RLN and RP) and runsrsquo intensity(GLN) [29] As SRE and LRE do not consider pixelsrsquo intensityLGRE andHGREmeasures have been proposed [30] FinallySRLGE SRHGE LRLGE and LRHGEmeasures are based onstatistical properties of a joint probability distribution of bothrunsrsquo intensity and length [31]

SRE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

119894119895

1198952

LRE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198952

119894119895

GLN =1

119899119903

119866

sum

119894=1

(

119877

sum

119895=1

119894119895)

2

RLN =1

119899119903

119877

sum

119895=1

(

119866

sum

119894=1

119894119895)

2

10 BioMed Research International

RP =119899119903

119899

LGRE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

119894119895

1198942

HGRE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198942

119894119895

SRLGE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

119894119895

1198942 sdot 1198952

SRHGE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198942

119894119895

1198952

LRLGE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198952

119894119895

1198942

LRHGE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198942

1198952

119894119895

(12)

where 119899 is the number of image pixels with 119866 gray levels 119877 isthe maximal run length and 119899

119903is the total number of runs in

an image (119899119903= sum119866

119894=1sum119877

119895=1119894119895)

26 Asymmetry The observed asymmetry is a significantdiagnostic premise as in malignant lesions the arrangementof local structures (eg dots streaks and pigmentation nets)is nonuniform across the whole area of a lesion [32] Someresearchers point out that for instance sharp transitionsbetween central and border area indicatemalignancy [24 33]

Asymmetry measures may be defined in terms of ratiosof feature values across various lesionrsquos area segments or interms of changes of feature values between halves obtainedby splitting lesion area into halves using straight section linespassing through the center of mass [32 33]

In our study a different approach has been adopted Thelesion area has been first divided into sets of subregions inthree differentmanners (Figure 11) by splitting (1) into centraland border part (2) into halves along minor and major axisand (3) into quarters using same axes Then a variance offeature values has been calculated for each set of subregionsAdditionally asymmetry indices proposed by Stoecker et al[34] and the following quantities have been computed

119877119876119894

=119891119876119894

119891119882

(13)

where 119891119876119894

denotes value of feature 119891 calculated for 119894thquadrant and 119891

119882for the whole area respectively

27 Data Preparation In order to apply correlated-basefeature selection method (the choice of this particularmethod is discussed later) numerical features have beendiscretizedThis is due to the fact that in the studied problem

the decision feature is a categorical variable which meansit takes on discrete values and the relationship betweenvariables of those two classes cannot be directly assessedTheonly solution is to discretize the variable taking continuousvalues In general terms discretization is a simple logicalcondition which takes into account one or a few attributesand which aims at splitting a range of data into at least twosubsets Data in our study have been discretized accordingto the method proposed by Fayyad and Irani [35] whichuses heuristics minimizing entropy based on the ldquominimumdescription length principlerdquo

To ensure correct work of algorithms based on analysis ofdistances between points in the feature space (eg 119896-nearestneighbors algorithm SVM) for each feature its values havebeen scaled to the range of [minus1 1] Consequently the riskof a situation in which a feature with larger range of valuesdominates other features has been eliminated Normallydistributed features have been transformed using 119885-score(Equation (14)) whereas other features were rescaled linearlyto the desired range (15) [36] Consider

x =x minus x3119904

where x =1

119873sum

119894

x119894 119904 =

1

119873 minus 1(sum

119894

(x119894minus 120583)2

)

(14)

x = 2x119894minus xmin

xmax minus xminminus 1 (15)

where x is a vector of featurersquos values prior to normalizationand x is the same vector after normalization To determineif a given feature is well modeled by a normal distributionchi-square goodness-of-fit test has been applied

28 Class Imbalance Problem The data set used in thisstudy exhibits class imbalance problem which means thatthe classes are not approximately equally represented (ie itconsists of more than three times as many images of Clarknevus than images of blue nevus) In such case most clas-sifiers will focus on learning how to identify representativesof majority classes leading to poor predictive accuracy forminority classes When making a diagnosis such a situationis unacceptable as misclassification error may lead even topatientrsquos death

The most common way to address this issue is sampling[51] There exist two main types of sampling methodsundersampling which consists in removing representativesof the majority class and oversampling which consists inadding examples of the minority class Out of many samplingtechniques two methods are particularly popular randomundersampling and synthetic minority oversampling technique(SMOTE) In random undersampling method randomlydrawn representatives of the majority class are removed fromthe data set In SMOTE method new ldquosyntheticrdquo examplesare created by averaging a few representatives of the sameminority classes which are nearest neighbors in the featurespace [52]

BioMed Research International 11

(a) Whole area

(b) Inner and outer area

(c) Split along the major axis (d) Split along the minor axis

(e) Split along both major and minor axis simultaneously

Figure 11 A sample division of an object into subregions

In this study SMOTE method has been used as it hasbeen proven to be more effective than random undersam-pling when tested on a set of dermoscopic images obtainedfrom various sources without any common or rare lesiontypes omitted [24]

29 Feature Selection Feature selection consists in reducingdata dimensionality by rejecting redundant unimportant ornoisy features thus resulting in increased prediction accu-racy less complex classifier models and better computationefficiency

Feature selection algorithms may be divided into threemain categories filters wrappers and projections [53 54] Inthis study filter methods have been used for two reasons[24] Firstly as filters are usually fast it is possible to test

a huge number of combinations of features Secondly if onewould like to use wrappers on a given data set the targetlearning algorithm should demonstrate satisfactory resultsfor the original data set as wrappers are based on feedbackprinciple As some features extracted in this study might beirrelevant or redundant as well as due to class imbalancewrappers have not been likely to fulfill those restrictionsOn the other hand projections are used mainly to increasecomputational performance which was not the case in thisstudy

Out of numerous available filters three areworth noticingdue to their satisfactory performance on various data sets[53] ReliefFmutual information-based feature selection andcorrelation-based feature selection [55ndash57] As numerous fea-tures extracted in this study are strongly mutually correlated

12 BioMed Research International

correlation-based feature selection has been chosen as theonly method which takes into account not only relationshipsbetween features and the decision class but also relationshipsbetween features themselves

The heuristic ldquomeritrdquo Merit119878of a feature subset 119878 contain-

ing 119896 features is given by [58]

Merit119878=

119896119903cf

radic119896 + 119896 (119896 minus 1) 119903ff (16)

where 119903cf is the average feature-class correlation and 119903ff theaverage feature-feature intercorrelation

To compare two feature vectors of discrete values thesymmetric uncertainty an improved version of informationgain measure has been used [59] Symmetric uncertainty isgiven by

SU (119883 119884) = 2 [119867 (119883) + 119867 (119884) minus 119867 (119883 119884)

119867 (119884) + 119867 (119883)] (17)

where 119867(119860) is the marginal entropy of set 119860 and 119867(119860 119861) isthe joint entropy of sets 119860 and 119861

The number of selected features is a parameter whichrequires a careful tuning too few features results may pre-vent classifiers from distinguishing between various classeswhereas too many features impose risk of overfittingmdashasituation when a model excels in classifying training databut fails to generalize knowledge and hence misclassifies newsamples As some researchers suggest limiting the number offeatures 119896 to the range of 5 le 119896 le 30 [24] in this study 119896 = 20

has been adoptedThe applied procedure allowed choosing features which

best discriminate lesion types concentricity computed using119896-means algorithm concentricity computed using kernel 119896-means algorithm for 120590 = 1 119871lowast119886lowast119887lowast histogram distancesin 1198711and 119871

2metrics computed for central and border

part variances from sets of 119871lowast

119886lowast

119887lowast histogram distances

between the whole region and individual quarters in both1198711and 119871

2metrics variances from sets of 119871lowast119886lowast119887lowast histogram

distances between each pair of quarters in both 1198711and 119871

2

metrics variances from sets of GLCM features (dissimilar-ity maximum probability entropy energy correlation andcontrast) computed for quarters variance of variances of119886lowast channel values (in 119871

lowast

119886lowast

119887lowast color space) calculated for

quarters variance of variances of H channel values (in HSVcolor space) calculated for quarters mean value of variancesof 119886lowast channel values (in 119871

lowast

119886lowast

119887lowast color space) for quarters

variance from sets of 119871lowast119886lowast119887lowast histogram distances betweenhalves in both 119871

1and 119871

2metrics and variance from sets of

GLCM dissimilarity computed for halvesThis particular choice of features does not diverge from

the clinical practice according to which asymmetry is one ofthe most important premises in determining the lesion type

210 Classification

2101 Models The following predictive models have beentested [60 61] 119896-nearest neighbors algorithm (for 119896 isin

3 5 10 15) logistic regression decision tree and supportvector machine (SVM) Other models like neural networks

or rule-based systems have not been a subject of this study[62] In case of the decision tree Gini-Simpson index hasbeen used as a measure of data diversity [63] Decisionboundary of SVMs has been determined using radial basisfunction as kernels Radial basis function has been preferredover linear sigmoid and polynomial kernels as it exhibits afew important properties [24] it allows classifying nonlin-early separable data sets (as is the case in this study) it ischaracterized by high numerical stability and it is definedusing only one parameter Moreover as in this study there areas many as four decision classes the multiclass classificationproblem has been decomposed into a (greater) number ofbinary classification problems Winner-takes-all strategy hasbeen applied to combine solutions of subproblems into afinal solution as for small data sets its results are similarto the results of the best strategymdashpairwise couplingmdashwhileat the same time its computational burden is much lower[64 65]

Individual classifiers have been trained using typicalparameters It has been assumed that the cost of misclassify-ing melanoma as nevi is four times higher than another sortof misclassification Initial experimental results have provenSVM to be the most effective classifier (similar observationshave been made by other research teams [60 66]) Con-sequently further experiments were focused on improvingperformance of SVMs by fine-tuning their parameters

SVMwith radial basis function kernel is governed by twoparametersmdashmisclassification cost 119862 and kernel width 120574The task is to select those parameters in such a way that theaccuracy of predictions for new data (data not used in thelearning process) is maximal

As values of only two parameters have had to be adjustedgrid search has been applied [67] For each parametervalues from exponential series have been considered 119862 isin

2minus5

2minus3

215

and 120574 isin 2minus15

2minus13

23

Each com-bination of parameter values (119862

0 1205740) has been assessed

using a validation procedure described below After the gridsearch had finished SVM have been trained using optimalparameter values (119862lowast 120574lowast)

2102 Validation The aim of the validation step is to assessclassifierrsquos quality based on the number of generalizationerrors that is misclassified samples Optimal SVM modelshave been validated using stratified Monte Carlo cross vali-dation method and in each iteration the test set consisted of10 of samples

The Monte Carlo variant of cross validation had beenchosen for a number of reasons Firstly experimental resultsobtained for a similar problem [24] suggest high effectivenessof this method Secondly in their analysis Molinaro et al[68] point out that for datasets consisting of few sampleswith many features such as a dataset used in this studythe difference in effectiveness between Monte Carlo andldquoclassicrdquo 10-fold cross validation is insignificant The numberof samples drawn into the test set has been determined usinga ldquorule of thumbrdquo [69] Finally by applying Monte Carlocross validation one may avoid constructing overdevel-oped predictive models which decreases the risk of overfit-ting

BioMed Research International 13

As in the data set used in this study there is a considerabledifference in the number of representatives between variousdecision classes and stratification has been applied Stratifica-tion ensures the similar distribution of representatives of eachdecision class in both training set and test set in each iterationof the validation procedure

3 Results and Discussion

31 Database Specification The described algorithm for theautomatic classification of specific melanocytic lesions hasbeen tested on dermoscopic images from a widely usedInteractive Atlas of Dermoscopy [2] Images for this atlashave been provided by two university hospitals (Universityof Naples Italy and University of Graz Austria) and storedon a CD-ROM in the JPEG format The documentation ofeach dermoscopic image was performed using a Dermaphotapparatus (Heine Optotechnik Herrsching Germany) anda photo camera (Nikon F3) mounted on a stereomicroscope(Wild M650 Heerbrugg AG Switzerland) in order to pro-duce digitized ELM images of skin lesions All the imageshave been assessed manually by a dermoscopic expert withan extensive clinical experience

Furthermore all the descriptions of skin cases were basedon the histopathological examination of the biopsy materialIn order to develop and test the automatic procedure for theclassification of melanocytic skin lesions 300 images withdifferent resolutions ranging from 0033 to 05mmpixelwere chosen The database included 100 Clark nevus cases70 blue nevus cases 70 Spitz nevus cases and 60 malignantmelanomas

The preprocessing step (black frame removal and hairremoval) as well as the segmentation step (border error lessthan 6) did not affect the further research [14]

32 Statistical Analysis The performance of a classifier canbe assessed based on the analysis of discrepancies in classifi-cation that is differences between the classification carriedout by the classifier and the actual classification (groundtruth) summarized in a confusion matrix In this matrixeach row refers to actual classes 119888(119909) as recorded in the testset and each column refers to classes as predicted by theclassifier (119909) The (119894 119895)th element contains the number oftest instances predicted by a classifier to belong to 119895th classclass whereas they are actually representatives of 119894th class

From a contingency table we can calculate three perfor-mance indicators accuracy true positive rate (TPR) and truenegative rate (TNR)

For a binary classification problem those indicators aregiven by [70]

Accuracy = 1

Tesum

119909isinTe119868 [ (119909) = 119888 (119909)]

TPR =sum119909isinTe 119868 [ (119909) = 119888 (119909) = oplus]

sum119909isinTe 119868 [119888 (119909) = oplus]

TNR =sum119909isinTe 119868 [ (119909) = 119888 (119909) = ⊖]

sum119909isinTe 119868 [119888 (119909) = ⊖]

(18)

where Te is the test set and the function 119868[sdot] denotes theindicator function Positive and negative classes are denotedby oplus and ⊖ respectively

Although the initial problem is a multiclass classificationproblem (with four classes) it can be decomposed into fourbinary classification problems whether an instance belongsto the given class or not

When dealing with a data set exhibiting class imbalanceproblem the accuracy should be considered only a prelim-inary performance indicator For instance for a data setwith class ratio 99 1 a classifier which simply counts eachinstance as a representative of the majority class would scorethe accuracy of 99 Therefore in such cases better measurewould be a plot of the ROC curve and the area under thatcurve

The ROC plot completely visualizes the (normalized)contingency table by means of plot in the unit squarewith TPR rate on the 119910-axis and TNR rate on the 119909-axisEach of the points marked on the ROC plot specifies theclassification performance in terms of true and false positiverates achieved by the corresponding score thresholds In ourstudy posterior probabilities have been used as scores Thosepoints are connected by straight lines producing a piecewiselinear curve the ROC curve that rises monotonically from(0 0) to (1 1) The ROC plot can be interpreted as a plotof costs-to-benefits ratio To measure the performance of aclassifier using a ROC plot the area under that plot curve(AUC) is calculated

Values of aforementioned performance indicators forall examined classifiers have been summarized in Table 2Additionally Figure 12 presents ROC plots for a malignant-or-benign classification

Among all examined classifiers SVM achieved definitelybest results It achieved highest overall accuracy ACCALL =

09296 whereas for the second-best classifier the logisticregression ACCALL = 08028 The SVM outperformed thelogistic regression in classifying all four lesion types It isto be stressed that the SVM turned out to be the mosteffective classifier in recognizing malignant melanoma themost dangerous of all four lesions Its true positive rate andarea under the curve amounted to TPRMM = 08611 andAUCMM = 09688 respectively A little bit lower true negativerate for the SVM than for the decision tree is not a problemin the studied area It indicates misclassification of Clarkor Spitz nevi as malignant melanoma and the cost of sucha misdiagnosis is much much lower than of classifying amalignant tumor as a benign one

The achieved results are slightly better thanmost results ofmalignant-or-benign studies conducted so far Our methodallowed classifying malignant lesions with TPRMM = 08611

and TNRMM = 09623 whereas for other similar studies TPRfalls within the range 08ndash10 andTNRndash05ndash095 respectively[66 71]

A truly reliable diagnosis can be made only by ahistopathological examination of a lesion When in theirstudies Curley et al asked three experienced dermatologiststo classify lesions based only on the visual examination thosephysicians identified correctly only half of cases Therefore

14 BioMed Research International

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(a) Logit (AUC = 09469)

0

02

04

06

08

1

TPR

02 04 06 08 101 minus TNR

(b) Tree (AUC = 09555)

0

02

04

06

08

1

TPR

104 06 080201 minus TNR

(c) 3NN (AUC = 09152)

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(d) 5NN (AUC = 08746)

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(e) 10NN (AUC = 07860)

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(f) 15NN (AUC = 07378)

02 04 06 080 11 minus TNR

0

02

04

06

08

1

TPR

(g) SVM (AUC = 09688)

Figure 12 ROC plots for individual classifiers for a malignant-or-benign classification

BioMed Research International 15

Table 2 The assessment of classifiers for a 1-vs-all and overall classification

Measure Logit Tree 3NN 5NN 10NN 15NN SVMTPRBN 09730 08485 07200 07660 06939 06735 10000TNRBN 10000 09266 10000 10000 09785 09677 10000AUCBN 10000 09675 09914 09889 09679 09659 10000TPRCN 08611 06977 08519 07586 08148 08182 09143TNRCN 09528 09394 08870 08761 08783 08500 09626AUCCN 09817 09279 09615 09089 09099 09030 09851TPRMM 08387 09655 07419 06786 06071 05185 08611TNRMM 09189 09381 08919 08596 08421 08174 09623AUCMM 09469 09555 09152 08746 07860 07378 09688TPRSN 08421 07568 08235 07105 06053 05227 09429TNRSN 09712 09333 09352 09231 08846 08776 09813AUCSN 09648 09425 09606 09395 08812 08545 09789ACCALL 08803 08028 07746 07324 06761 06197 09296BN blue nevus CN Clark nevus MM malignant melanoma SN Spitz nevus ALL multiclass classification

the presented method of classification yields better resultsthan those obtained during a medical examination

4 Conclusions

This paper presents a computer-aided approach for theautomatic classification of melanocytic lesions and accuratediagnosis of melanoma In our research we evaluated ourapproach with four types of classifiers 119896-nearest neighborsalgorithm logistic regression decision tree and SVM Thedeveloped system achieved 92 accuracy with SVM

Our study gives an important contribution to the researcharea of skin lesion classification for several reasons Firstly itfocuses on specifying the exact type of a skin lesion and notonly on categorization of skin lesions as benign or malignantSecondly this work combines the results of research doneso far related to all the steps needed for the developmentof an automatic diagnostic system for melanocytic lesiondetection and classification Finally the refinement of currentapproaches and development of new techniques andmethodswill help to improve the ability to diagnose skin moles moreprecisely and to achieve the goal of the significant reductionin melanoma mortality rate

41 Future Work Starting from the present frameworkfurther research efforts will be firstly addressed to compareand integrate the very promising approaches and corre-sponding feature descriptors reported in the most recentliterature in order to improve the classification accuracy ofmelanocytic lesions We will also conduct a follow-up studyby collecting more real data especially melanoma cases tofurther evaluate our approach One of the major problems inthe field of melanocytic skin tumors is the underdiagnosisof melanoma as a benign melanocytic or nonmelanocyticlesion To decrease the amount of dermoscopic pitfalls thenumber of melanocytic lesion types will be extended for abetter evaluation of melanocytic lesion

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This scientific work was partly supported by the National Sci-ence Centre based on the Decision no 201101NST706783

References

[1] Skin September 2015 httpsciencenationalgeographiccomsciencehealth-and-human-bodyhuman-bodyskin-article

[2] G Argenziano and H P Soyer Interactive Atlas of DermoscopyBook and CDWeb Resource Edra Medical Publishing and NewMedia Milan Itlay 2000

[3] K Korotkov and R Garcia ldquoComputerized analysis of pig-mented skin lesions a reviewrdquoArtificial Intelligence inMedicinevol 56 no 2 pp 69ndash90 2012

[4] P Soyer G Argenziano R Hofmann-Wellenhof and RH JohrColor Atlas of Melanocytic Lesions of the Skin Springer NewYork NY USA 2007

[5] Melanoma Foundation of New Zealand About melanomakey information 2014 httpwwwmelanomanetworkconzAbout-MelanomaKey-Information

[6] E Stocketh T Rosen and S Shumack Managing Skin CancerSpringer 2010

[7] S W Menzies L Bischof H Talbot et al ldquoThe performanceof SolarScan an automated dermoscopy image analysis instru-ment for the diagnosis of primary melanomardquo Archives ofDermatology vol 141 no 11 pp 1388ndash1396 2005

[8] M Burroni R CoronaGDellrsquoEva et al ldquoMelanoma computer-aided diagnosis reliability and feasibility studyrdquoClinical CancerResearch vol 10 no 6 pp 1881ndash1886 2004

[9] A Masood and A A Al-Jumaily ldquoComputer aided diagnosticsupport system for skin cancer a review of techniques andalgorithmsrdquo International Journal of Biomedical Imaging vol2013 Article ID 323268 22 pages 2013

16 BioMed Research International

[10] M E Celebi W V Stoecker and R H Moss ldquoAdvances inskin cancer image analysisrdquo ComputerizedMedical Imaging andGraphics vol 35 no 2 pp 83ndash84 2011

[11] H Zhou M Chen R Gass et al ldquoFeature-preserving artifactremoval from dermoscopy imagesrdquo in Medical Imaging ImageProcessing vol 6914 of Proceedings of SPIE pp 1ndash9 San DiegoCalif USA February 2008

[12] P Wighton T K Lee and M S Atkins ldquoDermascopic hairdisocclusion using inpaintingrdquo in Medical Imaging Image Pro-cessing vol 6914 ofProceedings of SPIE pp 1ndash8 SanDiego CalifUSA February 2008

[13] Q Abbas I F Garcia M Emre Celebi and W Ahmad ldquoAfeature-preserving hair removal algorithm for dermoscopyimagesrdquo Skin Research and Technology vol 19 no 1 pp e27ndashe36 2013

[14] J Jaworek-Korjakowska ldquoNovel method for border irregularityassessment in dermoscopic color imagesrdquo Computational andMathematicalMethods inMedicine vol 2015 Article ID 49620211 pages 2015

[15] J Jaworek-Korjakowska and R Tadeusiewicz ldquoHair removalfrom dermoscopic color imagesrdquo Bio-Algorithms and Med-Systems vol 9 no 2 pp 53ndash58 2013

[16] M E Celebi H Iyatomi G Schaefer and W V StoeckerldquoLesion border detection in dermoscopy imagesrdquoComputerizedMedical Imaging and Graphics vol 33 no 2 pp 148ndash153 2009

[17] J Jaworek-Korjakowska Analiza i detekcja struktur lokalnych wczerniaku złosliwym [Autoreferat Rozprawy Doktorskiej] AGHUniversity of Science and Technology Krakow Poland 2013

[18] E Zagrouba and W Barhoumi ldquoA prelimary approach for theautomated recognition ofmalignantmelanomardquo ImageAnalysisamp Stereology vol 23 no 2 pp 121ndash135 2004

[19] D Ballardand and C Brown Computer Vision Prentice HallProfessional Technical Reference 1st edition 1982

[20] R M Haralick ldquoA measure for circularity of digital figuresrdquoIEEE Transactions on Systems Man and Cybernetics vol 4 no4 pp 394ndash396 1974

[21] R S Montero and E Bribiesca ldquoState of the art of compactnessand circularity measuresrdquo International Mathematical Forumvol 4 no 27 pp 1305ndash1335 2009

[22] J Smietanski R Tadeusiewicz and E Łuczynska ldquoTextureanalysis in perfusion images of prostate cancermdasha case studyrdquoInternational Journal of Applied Mathematics and ComputerScience vol 20 no 1 pp 149ndash156 2010

[23] J W Choi Y W Park S Y Byun and S W Youn ldquoDifferentia-tion of benign pigmented skin lesions with the aid of computerimage analysis a novel approachrdquo Annals of Dermatology vol25 no 3 pp 340ndash347 2013

[24] M E Celebi H A Kingravi B Uddin et al ldquoA methodologicalapproach to the classification of dermoscopy imagesrdquo Comput-erized Medical Imaging and Graphics vol 31 no 6 pp 362ndash3732007

[25] C Manning P Raghavan and H Schutze Introduction toInformation Retrieval Cambridge University Press 2008

[26] J Shawe-Taylor and N Cristianini Kernel Methods for PatternAnalysis Cambridge University Press Cambridge UK 2004

[27] L Kaufman and P Rousseeuw Finding Groups in Data AnIntroduction to Cluster Analysis Wiley Hoboken NJ USA1990

[28] R M Haralick K Shanmugam and I Dinstein ldquoTexturalfeatures for image classificationrdquo IEEE Transactions on SystemsMan and Cybernetics vol 3 no 6 pp 610ndash621 1973

[29] MM Galloway ldquoTexture analysis using gray level run lengthsrdquoComputer Graphics and Image Processing vol 4 no 2 pp 172ndash179 1975

[30] A Chu C M Sehgal and J F Greenleaf ldquoUse of gray valuedistribution of run lengths for texture analysisrdquo Pattern Recog-nition Letters vol 11 no 6 pp 415ndash419 1990

[31] B V Dasarathy and E B Holder ldquoImage characterizationsbased on joint gray level-run length distributionsrdquo PatternRecognition Letters vol 12 no 8 pp 497ndash502 1991

[32] H Ganster A Pinz R Rohrer E Wildling M Binder and HKittler ldquoAutomated melanoma recognitionrdquo IEEE Transactionson Medical Imaging vol 20 no 3 pp 233ndash239 2001

[33] M DrsquoAmico M Ferri and I Stanganelli ldquoQualitative asym-metry measure for melanoma detectionrdquo in Proceedings of theIEEE International Symposium on Biomedical Imaging Nano toMacro vol 2 pp 1155ndash1158 Arlington Va USA April 2004

[34] W V Stoecker W W Li and R H Moss ldquoAutomatic detectionof asymmetry in skin tumorsrdquo Computerized Medical Imagingand Graphics vol 16 no 3 pp 191ndash197 1992

[35] U Fayyad and K Irani ldquoMulti-interval discretization ofcontinuous-valued attributes for classification learningrdquo inProceedings of the 13th International Joint Conference on ArticialIntelligence vol 2 pp 1022ndash1029 Morgan Kaufmann 1993

[36] S Aksoy and R M Haralick ldquoFeature normalization andlikelihood-based similarity measures for image retrievalrdquo Pat-tern Recognition Letters vol 22 no 5 pp 563ndash582 2001

[37] V Ng and D Cheung ldquoMeasuring asymmetries of skin lesionsrdquoin Proceedings of IEEE International Conference on SystemsMan and Cybernetics vol 5 pp 4211ndash4216 IEEE PressOrlando Fla USA October 1997

[38] Z She Y Liu and A Damatoa ldquoCombination of features fromskin pattern and ABCD analysis for lesion classificationrdquo SkinResearch and Technology vol 13 no 1 pp 25ndash33 2007

[39] B Kusumoputro and A Ariyanto ldquoNeural network diagnosisof malignant skin cancers using principal component analysisas a preprocessorrdquo in Proceedings of the IEEE International JointConference on Neural Networks and the IEEE World Congresson Computational Intelligence vol 1 pp 310ndash315 IEEE PressAnchorage Alaska USA May 1998

[40] E Claridge P N Hall M Keefe and J P Allen ldquoShape analysisfor classification ofmalignantmelanomardquo Journal of BiomedicalEngineering vol 14 no 3 pp 229ndash234 1992

[41] Y Cheng R Swamisai S E Umbaugh et al ldquoSkin lesionclassification using relative color featuresrdquo Skin Research andTechnology vol 14 no 1 pp 53ndash64 2008

[42] K Cheung Image processing for skin cancer detection malignantmelanoma recognition [PhD thesis] University of Toronto 1997

[43] T K Lee D I McLean and M S Atkins ldquoIrregularity indexa new border irregularity measure for cutaneous melanocyticlesionsrdquoMedical Image Analysis vol 7 no 1 pp 47ndash64 2003

[44] T K Lee and E Claridge ldquoPredictive power of irregular bordershapes for malignant melanomasrdquo Skin Research and Technol-ogy vol 11 no 1 pp 1ndash8 2005

[45] Y Chang R J Stanley R H Moss andW Van Stoecker ldquoA sys-tematic heuristic approach for feature selection for melanomadiscrimination using clinical imagesrdquo Skin Research and Tech-nology vol 11 no 3 pp 165ndash178 2005

[46] A J Round AW G Duller and P J Fish ldquoLesion classificationusing skin patterningrdquo Skin Research and Technology vol 6 no4 pp 183ndash192 2000

BioMed Research International 17

[47] Z She and P Fish ldquoAnalysis of skin line pattern for lesionclassificationrdquo Skin Research and Technology vol 9 no 1 pp73ndash80 2003

[48] R P Walvick K Patel S V Patwardhan and A P DhawanldquoClassification of melanoma using wavelet-transform-basedoptimal feature setrdquo in Proceedings of the Medical ImagingImage Processing vol 5370 of Proceedings of SPIE pp 944ndash951May 2004

[49] W V Stoecker C-S Chiang and R H Moss ldquoTexture in skinimages comparison of three methods to determine smooth-nessrdquo Computerized Medical Imaging and Graphics vol 16 no3 pp 179ndash190 1992

[50] S V Deshabhoina S E Umbaugh W V Stoecker R H Mossand S K Srinivasan ldquoMelanoma and seborrheic keratosisdifferentiation using texture featuresrdquo Skin Research and Tech-nology vol 9 no 4 pp 348ndash356 2003

[51] G M Weiss ldquoMining with rarity a unifying frameworkrdquo ACMSIGKDD Explorations Newsletter vol 6 no 1 pp 7ndash19 2004

[52] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

[53] H Liu and L Yu ldquoToward integrating feature selection algo-rithms for classification and clusteringrdquo IEEE Transactions onKnowledge and Data Engineering vol 17 no 4 pp 491ndash5022005

[54] I T Jolliffe Principal Component Analysis Springer New YorkNY USA 1986

[55] I Kononenko and E Simec ldquoInduction of decision treesusing RELIEFFrdquo in Proceedings of the ISSEK94 Workshop onMathematical and Statistical Methods in Artificial Intelligencevol 363 of International Centre forMechanical Sciences pp 199ndash220 Springer Vienna Austria 1995

[56] R Battiti ldquoUsing mutual information for selecting features insupervised neural net learningrdquo IEEE Transactions on NeuralNetworks vol 5 no 4 pp 537ndash550 1994

[57] M A Hall ldquoCorrelation-based feature selection for discreteand numeric class machine learningrdquo in Proceedings of the 17thInternational Conference on Machine Learning (ICML rsquo00) pp359ndash366 Morgan Kaufmann 2000

[58] E E Ghiselli Theory of Psychological Measurement McGraw-Hill New York NY USA 1964

[59] I H Witten and E Frank Data Mining Practical MachineLearning Tools and Techniques Morgan Kaufmann San Fran-cisco Calif USA 2005

[60] S Dreiseitl L Ohno-Machado H Kittler S Vinterbo HBillhardt and M Binder ldquoA comparison of machine learningmethods for the diagnosis of pigmented skin lesionsrdquo Journal ofBiomedical Informatics vol 34 no 1 pp 28ndash36 2001

[61] A Glowacz A Glowacz and P Korohoda ldquoRecognition ofmonochrome thermal images of synchronous motor with theapplication of binarization and nearestmean classifierrdquoArchivesof Metallurgy and Materials vol 59 no 1 pp 31ndash34 2014

[62] R Tadeusiewicz How Intelligent Should be System for ImageAnalysisvol 339 of Studies in Computational IntelligenceSpringer Berlin Germany 2011

[63] L Jost ldquoEntropy and diversityrdquo Oikos vol 113 no 2 pp 363ndash375 2006

[64] K-B Duan and S S Keerthi ldquoWhich is the bestmulticlass SVMmethod An empirical studyrdquo inMultiple Classifier Systems 6thInternational Workshop MCS 2005 Seaside CA USA June 13ndash15 2005 Proceedings vol 3541 of Lecture Notes in ComputerScience pp 278ndash285 Springer Berlin Germany 2005

[65] C Hsu and C Lin ldquoA comparison of methods for multi-class support vector machinesrdquo IEEE Transactions on NeuralNetworks vol 13 no 2 pp 415ndash425 2002

[66] A A Safi Towards computer-aided diagnosis of pigmented skinlesionson [PhD thesis] Technische Universitat Munchen 2012

[67] C-W Hsu C-C Chang and C-J Lin ldquoA practical guide tosupport vector classificationrdquo Tech Rep National Taiwan Uni-versity 2003 httpwwwcsientuedutwsimcjlinpapersguideguidepdf

[68] A M Molinaro R Simon and R M Pfeiffer ldquoPrediction errorestimation a comparison of resamplingmethodsrdquo Bioinformat-ics vol 21 no 15 pp 3301ndash3307 2005

[69] R Bouckaert ldquoChoosing between two learning algorithmsbased on calibrated testsrdquo in Proceedings of the 20th Interna-tional Conference on Machine Learning (ICML rsquo03) T Fawcettand N Mishra Eds pp 51ndash58 AAAI Press Washington DCUSA 2003

[70] P Flach Machine Learning The Art and Science of AlgorithmsThat Make Sense of Data Cambridge University Press NewYork NY USA 2012

[71] P Rubegni G Cevenini M Burroni et al ldquoAutomated diagno-sis of pigmented skin lesionsrdquo International Journal of Cancervol 101 no 6 pp 576ndash580 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 9: Research Article Automatic Classification of Specific ...downloads.hindawi.com/journals/bmri/2016/8934242.pdfmost popular melanocytic nevi to the diagnostic categories including Clark

BioMed Research International 9

Object S1 S2 CCmax

Core S1 cap core Hull S2 cap hull

Figure 10 Regions used to compute concentricity of an object

25 Texture Description Quantitative properties of lesionrsquostexture were described with measures based on a gray levelcooccurrence matrix (GLCM) and a gray level run-lengthmatrix (GLRLM) [28 29]

GLCM is a square matrix 119875 where the (119894 119895)th entry of 119875represents information about the frequency of occurrence ofsuch two adjacent pixels where one of them has intensity 119894

and another has intensity 119895 Such informationmay be used toderive second-order statistical measurements characterizingexamined texture GLCM and six measures derived from it(contrast correlation energy homogeneity maximum prob-ability and dissimilarity) have been calculated as describedby Celebi et al [24]

Contrast = sum

119894119895

(119894 minus 119895)2

119875119894119895

Correlation = minussum

119894119895

(119894 minus 120583119909) (119895 minus 120583

119910)

120590119909120590119910

119875119894119895

Energy = sum

119894

sum

119895

[119875119894119895]2

Homogeneity = sum

119894

sum

119895

119875119894119895

1 + (119894 minus 119895)2

Maximum Probability = max119894119895

119875119894119895

Dissimilarity = sum

119894

sum

119895

119875119894119895sdot1003816100381610038161003816119894 minus 119895

1003816100381610038161003816

(11)

With GLRLM it is possible to analyze higher orderstatistical features for the given texture GLRLM is a two-dimensional matrix in which each element 119901(119894 119895 | 120579) gives

the total number of occurrences of runs of length 119895 at graylevel 119894 in a given direction 120579

In our study the direction of runs is irrelevant asdermoscopic camera has no fixed orientation when takingimages of lesionsmdashthere is no ldquoreference orientationrdquo Theonly thing that matters is texturersquos homogeneity thus insteadof calculating measures for individual orientations 120579 isin

0∘

45∘

90∘

135∘

all GLRLM computed for the aforemen-tioned orientations have been added together and mea-sures have been computed only using this new orientation-invariant matrix

GLRLM is used to derive 11measures Five basicmeasures(SRE LRE GLN RLN and RP) describe the distribution ofrunsrsquo lengths (SRE LRE RLN and RP) and runsrsquo intensity(GLN) [29] As SRE and LRE do not consider pixelsrsquo intensityLGRE andHGREmeasures have been proposed [30] FinallySRLGE SRHGE LRLGE and LRHGEmeasures are based onstatistical properties of a joint probability distribution of bothrunsrsquo intensity and length [31]

SRE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

119894119895

1198952

LRE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198952

119894119895

GLN =1

119899119903

119866

sum

119894=1

(

119877

sum

119895=1

119894119895)

2

RLN =1

119899119903

119877

sum

119895=1

(

119866

sum

119894=1

119894119895)

2

10 BioMed Research International

RP =119899119903

119899

LGRE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

119894119895

1198942

HGRE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198942

119894119895

SRLGE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

119894119895

1198942 sdot 1198952

SRHGE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198942

119894119895

1198952

LRLGE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198952

119894119895

1198942

LRHGE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198942

1198952

119894119895

(12)

where 119899 is the number of image pixels with 119866 gray levels 119877 isthe maximal run length and 119899

119903is the total number of runs in

an image (119899119903= sum119866

119894=1sum119877

119895=1119894119895)

26 Asymmetry The observed asymmetry is a significantdiagnostic premise as in malignant lesions the arrangementof local structures (eg dots streaks and pigmentation nets)is nonuniform across the whole area of a lesion [32] Someresearchers point out that for instance sharp transitionsbetween central and border area indicatemalignancy [24 33]

Asymmetry measures may be defined in terms of ratiosof feature values across various lesionrsquos area segments or interms of changes of feature values between halves obtainedby splitting lesion area into halves using straight section linespassing through the center of mass [32 33]

In our study a different approach has been adopted Thelesion area has been first divided into sets of subregions inthree differentmanners (Figure 11) by splitting (1) into centraland border part (2) into halves along minor and major axisand (3) into quarters using same axes Then a variance offeature values has been calculated for each set of subregionsAdditionally asymmetry indices proposed by Stoecker et al[34] and the following quantities have been computed

119877119876119894

=119891119876119894

119891119882

(13)

where 119891119876119894

denotes value of feature 119891 calculated for 119894thquadrant and 119891

119882for the whole area respectively

27 Data Preparation In order to apply correlated-basefeature selection method (the choice of this particularmethod is discussed later) numerical features have beendiscretizedThis is due to the fact that in the studied problem

the decision feature is a categorical variable which meansit takes on discrete values and the relationship betweenvariables of those two classes cannot be directly assessedTheonly solution is to discretize the variable taking continuousvalues In general terms discretization is a simple logicalcondition which takes into account one or a few attributesand which aims at splitting a range of data into at least twosubsets Data in our study have been discretized accordingto the method proposed by Fayyad and Irani [35] whichuses heuristics minimizing entropy based on the ldquominimumdescription length principlerdquo

To ensure correct work of algorithms based on analysis ofdistances between points in the feature space (eg 119896-nearestneighbors algorithm SVM) for each feature its values havebeen scaled to the range of [minus1 1] Consequently the riskof a situation in which a feature with larger range of valuesdominates other features has been eliminated Normallydistributed features have been transformed using 119885-score(Equation (14)) whereas other features were rescaled linearlyto the desired range (15) [36] Consider

x =x minus x3119904

where x =1

119873sum

119894

x119894 119904 =

1

119873 minus 1(sum

119894

(x119894minus 120583)2

)

(14)

x = 2x119894minus xmin

xmax minus xminminus 1 (15)

where x is a vector of featurersquos values prior to normalizationand x is the same vector after normalization To determineif a given feature is well modeled by a normal distributionchi-square goodness-of-fit test has been applied

28 Class Imbalance Problem The data set used in thisstudy exhibits class imbalance problem which means thatthe classes are not approximately equally represented (ie itconsists of more than three times as many images of Clarknevus than images of blue nevus) In such case most clas-sifiers will focus on learning how to identify representativesof majority classes leading to poor predictive accuracy forminority classes When making a diagnosis such a situationis unacceptable as misclassification error may lead even topatientrsquos death

The most common way to address this issue is sampling[51] There exist two main types of sampling methodsundersampling which consists in removing representativesof the majority class and oversampling which consists inadding examples of the minority class Out of many samplingtechniques two methods are particularly popular randomundersampling and synthetic minority oversampling technique(SMOTE) In random undersampling method randomlydrawn representatives of the majority class are removed fromthe data set In SMOTE method new ldquosyntheticrdquo examplesare created by averaging a few representatives of the sameminority classes which are nearest neighbors in the featurespace [52]

BioMed Research International 11

(a) Whole area

(b) Inner and outer area

(c) Split along the major axis (d) Split along the minor axis

(e) Split along both major and minor axis simultaneously

Figure 11 A sample division of an object into subregions

In this study SMOTE method has been used as it hasbeen proven to be more effective than random undersam-pling when tested on a set of dermoscopic images obtainedfrom various sources without any common or rare lesiontypes omitted [24]

29 Feature Selection Feature selection consists in reducingdata dimensionality by rejecting redundant unimportant ornoisy features thus resulting in increased prediction accu-racy less complex classifier models and better computationefficiency

Feature selection algorithms may be divided into threemain categories filters wrappers and projections [53 54] Inthis study filter methods have been used for two reasons[24] Firstly as filters are usually fast it is possible to test

a huge number of combinations of features Secondly if onewould like to use wrappers on a given data set the targetlearning algorithm should demonstrate satisfactory resultsfor the original data set as wrappers are based on feedbackprinciple As some features extracted in this study might beirrelevant or redundant as well as due to class imbalancewrappers have not been likely to fulfill those restrictionsOn the other hand projections are used mainly to increasecomputational performance which was not the case in thisstudy

Out of numerous available filters three areworth noticingdue to their satisfactory performance on various data sets[53] ReliefFmutual information-based feature selection andcorrelation-based feature selection [55ndash57] As numerous fea-tures extracted in this study are strongly mutually correlated

12 BioMed Research International

correlation-based feature selection has been chosen as theonly method which takes into account not only relationshipsbetween features and the decision class but also relationshipsbetween features themselves

The heuristic ldquomeritrdquo Merit119878of a feature subset 119878 contain-

ing 119896 features is given by [58]

Merit119878=

119896119903cf

radic119896 + 119896 (119896 minus 1) 119903ff (16)

where 119903cf is the average feature-class correlation and 119903ff theaverage feature-feature intercorrelation

To compare two feature vectors of discrete values thesymmetric uncertainty an improved version of informationgain measure has been used [59] Symmetric uncertainty isgiven by

SU (119883 119884) = 2 [119867 (119883) + 119867 (119884) minus 119867 (119883 119884)

119867 (119884) + 119867 (119883)] (17)

where 119867(119860) is the marginal entropy of set 119860 and 119867(119860 119861) isthe joint entropy of sets 119860 and 119861

The number of selected features is a parameter whichrequires a careful tuning too few features results may pre-vent classifiers from distinguishing between various classeswhereas too many features impose risk of overfittingmdashasituation when a model excels in classifying training databut fails to generalize knowledge and hence misclassifies newsamples As some researchers suggest limiting the number offeatures 119896 to the range of 5 le 119896 le 30 [24] in this study 119896 = 20

has been adoptedThe applied procedure allowed choosing features which

best discriminate lesion types concentricity computed using119896-means algorithm concentricity computed using kernel 119896-means algorithm for 120590 = 1 119871lowast119886lowast119887lowast histogram distancesin 1198711and 119871

2metrics computed for central and border

part variances from sets of 119871lowast

119886lowast

119887lowast histogram distances

between the whole region and individual quarters in both1198711and 119871

2metrics variances from sets of 119871lowast119886lowast119887lowast histogram

distances between each pair of quarters in both 1198711and 119871

2

metrics variances from sets of GLCM features (dissimilar-ity maximum probability entropy energy correlation andcontrast) computed for quarters variance of variances of119886lowast channel values (in 119871

lowast

119886lowast

119887lowast color space) calculated for

quarters variance of variances of H channel values (in HSVcolor space) calculated for quarters mean value of variancesof 119886lowast channel values (in 119871

lowast

119886lowast

119887lowast color space) for quarters

variance from sets of 119871lowast119886lowast119887lowast histogram distances betweenhalves in both 119871

1and 119871

2metrics and variance from sets of

GLCM dissimilarity computed for halvesThis particular choice of features does not diverge from

the clinical practice according to which asymmetry is one ofthe most important premises in determining the lesion type

210 Classification

2101 Models The following predictive models have beentested [60 61] 119896-nearest neighbors algorithm (for 119896 isin

3 5 10 15) logistic regression decision tree and supportvector machine (SVM) Other models like neural networks

or rule-based systems have not been a subject of this study[62] In case of the decision tree Gini-Simpson index hasbeen used as a measure of data diversity [63] Decisionboundary of SVMs has been determined using radial basisfunction as kernels Radial basis function has been preferredover linear sigmoid and polynomial kernels as it exhibits afew important properties [24] it allows classifying nonlin-early separable data sets (as is the case in this study) it ischaracterized by high numerical stability and it is definedusing only one parameter Moreover as in this study there areas many as four decision classes the multiclass classificationproblem has been decomposed into a (greater) number ofbinary classification problems Winner-takes-all strategy hasbeen applied to combine solutions of subproblems into afinal solution as for small data sets its results are similarto the results of the best strategymdashpairwise couplingmdashwhileat the same time its computational burden is much lower[64 65]

Individual classifiers have been trained using typicalparameters It has been assumed that the cost of misclassify-ing melanoma as nevi is four times higher than another sortof misclassification Initial experimental results have provenSVM to be the most effective classifier (similar observationshave been made by other research teams [60 66]) Con-sequently further experiments were focused on improvingperformance of SVMs by fine-tuning their parameters

SVMwith radial basis function kernel is governed by twoparametersmdashmisclassification cost 119862 and kernel width 120574The task is to select those parameters in such a way that theaccuracy of predictions for new data (data not used in thelearning process) is maximal

As values of only two parameters have had to be adjustedgrid search has been applied [67] For each parametervalues from exponential series have been considered 119862 isin

2minus5

2minus3

215

and 120574 isin 2minus15

2minus13

23

Each com-bination of parameter values (119862

0 1205740) has been assessed

using a validation procedure described below After the gridsearch had finished SVM have been trained using optimalparameter values (119862lowast 120574lowast)

2102 Validation The aim of the validation step is to assessclassifierrsquos quality based on the number of generalizationerrors that is misclassified samples Optimal SVM modelshave been validated using stratified Monte Carlo cross vali-dation method and in each iteration the test set consisted of10 of samples

The Monte Carlo variant of cross validation had beenchosen for a number of reasons Firstly experimental resultsobtained for a similar problem [24] suggest high effectivenessof this method Secondly in their analysis Molinaro et al[68] point out that for datasets consisting of few sampleswith many features such as a dataset used in this studythe difference in effectiveness between Monte Carlo andldquoclassicrdquo 10-fold cross validation is insignificant The numberof samples drawn into the test set has been determined usinga ldquorule of thumbrdquo [69] Finally by applying Monte Carlocross validation one may avoid constructing overdevel-oped predictive models which decreases the risk of overfit-ting

BioMed Research International 13

As in the data set used in this study there is a considerabledifference in the number of representatives between variousdecision classes and stratification has been applied Stratifica-tion ensures the similar distribution of representatives of eachdecision class in both training set and test set in each iterationof the validation procedure

3 Results and Discussion

31 Database Specification The described algorithm for theautomatic classification of specific melanocytic lesions hasbeen tested on dermoscopic images from a widely usedInteractive Atlas of Dermoscopy [2] Images for this atlashave been provided by two university hospitals (Universityof Naples Italy and University of Graz Austria) and storedon a CD-ROM in the JPEG format The documentation ofeach dermoscopic image was performed using a Dermaphotapparatus (Heine Optotechnik Herrsching Germany) anda photo camera (Nikon F3) mounted on a stereomicroscope(Wild M650 Heerbrugg AG Switzerland) in order to pro-duce digitized ELM images of skin lesions All the imageshave been assessed manually by a dermoscopic expert withan extensive clinical experience

Furthermore all the descriptions of skin cases were basedon the histopathological examination of the biopsy materialIn order to develop and test the automatic procedure for theclassification of melanocytic skin lesions 300 images withdifferent resolutions ranging from 0033 to 05mmpixelwere chosen The database included 100 Clark nevus cases70 blue nevus cases 70 Spitz nevus cases and 60 malignantmelanomas

The preprocessing step (black frame removal and hairremoval) as well as the segmentation step (border error lessthan 6) did not affect the further research [14]

32 Statistical Analysis The performance of a classifier canbe assessed based on the analysis of discrepancies in classifi-cation that is differences between the classification carriedout by the classifier and the actual classification (groundtruth) summarized in a confusion matrix In this matrixeach row refers to actual classes 119888(119909) as recorded in the testset and each column refers to classes as predicted by theclassifier (119909) The (119894 119895)th element contains the number oftest instances predicted by a classifier to belong to 119895th classclass whereas they are actually representatives of 119894th class

From a contingency table we can calculate three perfor-mance indicators accuracy true positive rate (TPR) and truenegative rate (TNR)

For a binary classification problem those indicators aregiven by [70]

Accuracy = 1

Tesum

119909isinTe119868 [ (119909) = 119888 (119909)]

TPR =sum119909isinTe 119868 [ (119909) = 119888 (119909) = oplus]

sum119909isinTe 119868 [119888 (119909) = oplus]

TNR =sum119909isinTe 119868 [ (119909) = 119888 (119909) = ⊖]

sum119909isinTe 119868 [119888 (119909) = ⊖]

(18)

where Te is the test set and the function 119868[sdot] denotes theindicator function Positive and negative classes are denotedby oplus and ⊖ respectively

Although the initial problem is a multiclass classificationproblem (with four classes) it can be decomposed into fourbinary classification problems whether an instance belongsto the given class or not

When dealing with a data set exhibiting class imbalanceproblem the accuracy should be considered only a prelim-inary performance indicator For instance for a data setwith class ratio 99 1 a classifier which simply counts eachinstance as a representative of the majority class would scorethe accuracy of 99 Therefore in such cases better measurewould be a plot of the ROC curve and the area under thatcurve

The ROC plot completely visualizes the (normalized)contingency table by means of plot in the unit squarewith TPR rate on the 119910-axis and TNR rate on the 119909-axisEach of the points marked on the ROC plot specifies theclassification performance in terms of true and false positiverates achieved by the corresponding score thresholds In ourstudy posterior probabilities have been used as scores Thosepoints are connected by straight lines producing a piecewiselinear curve the ROC curve that rises monotonically from(0 0) to (1 1) The ROC plot can be interpreted as a plotof costs-to-benefits ratio To measure the performance of aclassifier using a ROC plot the area under that plot curve(AUC) is calculated

Values of aforementioned performance indicators forall examined classifiers have been summarized in Table 2Additionally Figure 12 presents ROC plots for a malignant-or-benign classification

Among all examined classifiers SVM achieved definitelybest results It achieved highest overall accuracy ACCALL =

09296 whereas for the second-best classifier the logisticregression ACCALL = 08028 The SVM outperformed thelogistic regression in classifying all four lesion types It isto be stressed that the SVM turned out to be the mosteffective classifier in recognizing malignant melanoma themost dangerous of all four lesions Its true positive rate andarea under the curve amounted to TPRMM = 08611 andAUCMM = 09688 respectively A little bit lower true negativerate for the SVM than for the decision tree is not a problemin the studied area It indicates misclassification of Clarkor Spitz nevi as malignant melanoma and the cost of sucha misdiagnosis is much much lower than of classifying amalignant tumor as a benign one

The achieved results are slightly better thanmost results ofmalignant-or-benign studies conducted so far Our methodallowed classifying malignant lesions with TPRMM = 08611

and TNRMM = 09623 whereas for other similar studies TPRfalls within the range 08ndash10 andTNRndash05ndash095 respectively[66 71]

A truly reliable diagnosis can be made only by ahistopathological examination of a lesion When in theirstudies Curley et al asked three experienced dermatologiststo classify lesions based only on the visual examination thosephysicians identified correctly only half of cases Therefore

14 BioMed Research International

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(a) Logit (AUC = 09469)

0

02

04

06

08

1

TPR

02 04 06 08 101 minus TNR

(b) Tree (AUC = 09555)

0

02

04

06

08

1

TPR

104 06 080201 minus TNR

(c) 3NN (AUC = 09152)

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(d) 5NN (AUC = 08746)

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(e) 10NN (AUC = 07860)

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(f) 15NN (AUC = 07378)

02 04 06 080 11 minus TNR

0

02

04

06

08

1

TPR

(g) SVM (AUC = 09688)

Figure 12 ROC plots for individual classifiers for a malignant-or-benign classification

BioMed Research International 15

Table 2 The assessment of classifiers for a 1-vs-all and overall classification

Measure Logit Tree 3NN 5NN 10NN 15NN SVMTPRBN 09730 08485 07200 07660 06939 06735 10000TNRBN 10000 09266 10000 10000 09785 09677 10000AUCBN 10000 09675 09914 09889 09679 09659 10000TPRCN 08611 06977 08519 07586 08148 08182 09143TNRCN 09528 09394 08870 08761 08783 08500 09626AUCCN 09817 09279 09615 09089 09099 09030 09851TPRMM 08387 09655 07419 06786 06071 05185 08611TNRMM 09189 09381 08919 08596 08421 08174 09623AUCMM 09469 09555 09152 08746 07860 07378 09688TPRSN 08421 07568 08235 07105 06053 05227 09429TNRSN 09712 09333 09352 09231 08846 08776 09813AUCSN 09648 09425 09606 09395 08812 08545 09789ACCALL 08803 08028 07746 07324 06761 06197 09296BN blue nevus CN Clark nevus MM malignant melanoma SN Spitz nevus ALL multiclass classification

the presented method of classification yields better resultsthan those obtained during a medical examination

4 Conclusions

This paper presents a computer-aided approach for theautomatic classification of melanocytic lesions and accuratediagnosis of melanoma In our research we evaluated ourapproach with four types of classifiers 119896-nearest neighborsalgorithm logistic regression decision tree and SVM Thedeveloped system achieved 92 accuracy with SVM

Our study gives an important contribution to the researcharea of skin lesion classification for several reasons Firstly itfocuses on specifying the exact type of a skin lesion and notonly on categorization of skin lesions as benign or malignantSecondly this work combines the results of research doneso far related to all the steps needed for the developmentof an automatic diagnostic system for melanocytic lesiondetection and classification Finally the refinement of currentapproaches and development of new techniques andmethodswill help to improve the ability to diagnose skin moles moreprecisely and to achieve the goal of the significant reductionin melanoma mortality rate

41 Future Work Starting from the present frameworkfurther research efforts will be firstly addressed to compareand integrate the very promising approaches and corre-sponding feature descriptors reported in the most recentliterature in order to improve the classification accuracy ofmelanocytic lesions We will also conduct a follow-up studyby collecting more real data especially melanoma cases tofurther evaluate our approach One of the major problems inthe field of melanocytic skin tumors is the underdiagnosisof melanoma as a benign melanocytic or nonmelanocyticlesion To decrease the amount of dermoscopic pitfalls thenumber of melanocytic lesion types will be extended for abetter evaluation of melanocytic lesion

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This scientific work was partly supported by the National Sci-ence Centre based on the Decision no 201101NST706783

References

[1] Skin September 2015 httpsciencenationalgeographiccomsciencehealth-and-human-bodyhuman-bodyskin-article

[2] G Argenziano and H P Soyer Interactive Atlas of DermoscopyBook and CDWeb Resource Edra Medical Publishing and NewMedia Milan Itlay 2000

[3] K Korotkov and R Garcia ldquoComputerized analysis of pig-mented skin lesions a reviewrdquoArtificial Intelligence inMedicinevol 56 no 2 pp 69ndash90 2012

[4] P Soyer G Argenziano R Hofmann-Wellenhof and RH JohrColor Atlas of Melanocytic Lesions of the Skin Springer NewYork NY USA 2007

[5] Melanoma Foundation of New Zealand About melanomakey information 2014 httpwwwmelanomanetworkconzAbout-MelanomaKey-Information

[6] E Stocketh T Rosen and S Shumack Managing Skin CancerSpringer 2010

[7] S W Menzies L Bischof H Talbot et al ldquoThe performanceof SolarScan an automated dermoscopy image analysis instru-ment for the diagnosis of primary melanomardquo Archives ofDermatology vol 141 no 11 pp 1388ndash1396 2005

[8] M Burroni R CoronaGDellrsquoEva et al ldquoMelanoma computer-aided diagnosis reliability and feasibility studyrdquoClinical CancerResearch vol 10 no 6 pp 1881ndash1886 2004

[9] A Masood and A A Al-Jumaily ldquoComputer aided diagnosticsupport system for skin cancer a review of techniques andalgorithmsrdquo International Journal of Biomedical Imaging vol2013 Article ID 323268 22 pages 2013

16 BioMed Research International

[10] M E Celebi W V Stoecker and R H Moss ldquoAdvances inskin cancer image analysisrdquo ComputerizedMedical Imaging andGraphics vol 35 no 2 pp 83ndash84 2011

[11] H Zhou M Chen R Gass et al ldquoFeature-preserving artifactremoval from dermoscopy imagesrdquo in Medical Imaging ImageProcessing vol 6914 of Proceedings of SPIE pp 1ndash9 San DiegoCalif USA February 2008

[12] P Wighton T K Lee and M S Atkins ldquoDermascopic hairdisocclusion using inpaintingrdquo in Medical Imaging Image Pro-cessing vol 6914 ofProceedings of SPIE pp 1ndash8 SanDiego CalifUSA February 2008

[13] Q Abbas I F Garcia M Emre Celebi and W Ahmad ldquoAfeature-preserving hair removal algorithm for dermoscopyimagesrdquo Skin Research and Technology vol 19 no 1 pp e27ndashe36 2013

[14] J Jaworek-Korjakowska ldquoNovel method for border irregularityassessment in dermoscopic color imagesrdquo Computational andMathematicalMethods inMedicine vol 2015 Article ID 49620211 pages 2015

[15] J Jaworek-Korjakowska and R Tadeusiewicz ldquoHair removalfrom dermoscopic color imagesrdquo Bio-Algorithms and Med-Systems vol 9 no 2 pp 53ndash58 2013

[16] M E Celebi H Iyatomi G Schaefer and W V StoeckerldquoLesion border detection in dermoscopy imagesrdquoComputerizedMedical Imaging and Graphics vol 33 no 2 pp 148ndash153 2009

[17] J Jaworek-Korjakowska Analiza i detekcja struktur lokalnych wczerniaku złosliwym [Autoreferat Rozprawy Doktorskiej] AGHUniversity of Science and Technology Krakow Poland 2013

[18] E Zagrouba and W Barhoumi ldquoA prelimary approach for theautomated recognition ofmalignantmelanomardquo ImageAnalysisamp Stereology vol 23 no 2 pp 121ndash135 2004

[19] D Ballardand and C Brown Computer Vision Prentice HallProfessional Technical Reference 1st edition 1982

[20] R M Haralick ldquoA measure for circularity of digital figuresrdquoIEEE Transactions on Systems Man and Cybernetics vol 4 no4 pp 394ndash396 1974

[21] R S Montero and E Bribiesca ldquoState of the art of compactnessand circularity measuresrdquo International Mathematical Forumvol 4 no 27 pp 1305ndash1335 2009

[22] J Smietanski R Tadeusiewicz and E Łuczynska ldquoTextureanalysis in perfusion images of prostate cancermdasha case studyrdquoInternational Journal of Applied Mathematics and ComputerScience vol 20 no 1 pp 149ndash156 2010

[23] J W Choi Y W Park S Y Byun and S W Youn ldquoDifferentia-tion of benign pigmented skin lesions with the aid of computerimage analysis a novel approachrdquo Annals of Dermatology vol25 no 3 pp 340ndash347 2013

[24] M E Celebi H A Kingravi B Uddin et al ldquoA methodologicalapproach to the classification of dermoscopy imagesrdquo Comput-erized Medical Imaging and Graphics vol 31 no 6 pp 362ndash3732007

[25] C Manning P Raghavan and H Schutze Introduction toInformation Retrieval Cambridge University Press 2008

[26] J Shawe-Taylor and N Cristianini Kernel Methods for PatternAnalysis Cambridge University Press Cambridge UK 2004

[27] L Kaufman and P Rousseeuw Finding Groups in Data AnIntroduction to Cluster Analysis Wiley Hoboken NJ USA1990

[28] R M Haralick K Shanmugam and I Dinstein ldquoTexturalfeatures for image classificationrdquo IEEE Transactions on SystemsMan and Cybernetics vol 3 no 6 pp 610ndash621 1973

[29] MM Galloway ldquoTexture analysis using gray level run lengthsrdquoComputer Graphics and Image Processing vol 4 no 2 pp 172ndash179 1975

[30] A Chu C M Sehgal and J F Greenleaf ldquoUse of gray valuedistribution of run lengths for texture analysisrdquo Pattern Recog-nition Letters vol 11 no 6 pp 415ndash419 1990

[31] B V Dasarathy and E B Holder ldquoImage characterizationsbased on joint gray level-run length distributionsrdquo PatternRecognition Letters vol 12 no 8 pp 497ndash502 1991

[32] H Ganster A Pinz R Rohrer E Wildling M Binder and HKittler ldquoAutomated melanoma recognitionrdquo IEEE Transactionson Medical Imaging vol 20 no 3 pp 233ndash239 2001

[33] M DrsquoAmico M Ferri and I Stanganelli ldquoQualitative asym-metry measure for melanoma detectionrdquo in Proceedings of theIEEE International Symposium on Biomedical Imaging Nano toMacro vol 2 pp 1155ndash1158 Arlington Va USA April 2004

[34] W V Stoecker W W Li and R H Moss ldquoAutomatic detectionof asymmetry in skin tumorsrdquo Computerized Medical Imagingand Graphics vol 16 no 3 pp 191ndash197 1992

[35] U Fayyad and K Irani ldquoMulti-interval discretization ofcontinuous-valued attributes for classification learningrdquo inProceedings of the 13th International Joint Conference on ArticialIntelligence vol 2 pp 1022ndash1029 Morgan Kaufmann 1993

[36] S Aksoy and R M Haralick ldquoFeature normalization andlikelihood-based similarity measures for image retrievalrdquo Pat-tern Recognition Letters vol 22 no 5 pp 563ndash582 2001

[37] V Ng and D Cheung ldquoMeasuring asymmetries of skin lesionsrdquoin Proceedings of IEEE International Conference on SystemsMan and Cybernetics vol 5 pp 4211ndash4216 IEEE PressOrlando Fla USA October 1997

[38] Z She Y Liu and A Damatoa ldquoCombination of features fromskin pattern and ABCD analysis for lesion classificationrdquo SkinResearch and Technology vol 13 no 1 pp 25ndash33 2007

[39] B Kusumoputro and A Ariyanto ldquoNeural network diagnosisof malignant skin cancers using principal component analysisas a preprocessorrdquo in Proceedings of the IEEE International JointConference on Neural Networks and the IEEE World Congresson Computational Intelligence vol 1 pp 310ndash315 IEEE PressAnchorage Alaska USA May 1998

[40] E Claridge P N Hall M Keefe and J P Allen ldquoShape analysisfor classification ofmalignantmelanomardquo Journal of BiomedicalEngineering vol 14 no 3 pp 229ndash234 1992

[41] Y Cheng R Swamisai S E Umbaugh et al ldquoSkin lesionclassification using relative color featuresrdquo Skin Research andTechnology vol 14 no 1 pp 53ndash64 2008

[42] K Cheung Image processing for skin cancer detection malignantmelanoma recognition [PhD thesis] University of Toronto 1997

[43] T K Lee D I McLean and M S Atkins ldquoIrregularity indexa new border irregularity measure for cutaneous melanocyticlesionsrdquoMedical Image Analysis vol 7 no 1 pp 47ndash64 2003

[44] T K Lee and E Claridge ldquoPredictive power of irregular bordershapes for malignant melanomasrdquo Skin Research and Technol-ogy vol 11 no 1 pp 1ndash8 2005

[45] Y Chang R J Stanley R H Moss andW Van Stoecker ldquoA sys-tematic heuristic approach for feature selection for melanomadiscrimination using clinical imagesrdquo Skin Research and Tech-nology vol 11 no 3 pp 165ndash178 2005

[46] A J Round AW G Duller and P J Fish ldquoLesion classificationusing skin patterningrdquo Skin Research and Technology vol 6 no4 pp 183ndash192 2000

BioMed Research International 17

[47] Z She and P Fish ldquoAnalysis of skin line pattern for lesionclassificationrdquo Skin Research and Technology vol 9 no 1 pp73ndash80 2003

[48] R P Walvick K Patel S V Patwardhan and A P DhawanldquoClassification of melanoma using wavelet-transform-basedoptimal feature setrdquo in Proceedings of the Medical ImagingImage Processing vol 5370 of Proceedings of SPIE pp 944ndash951May 2004

[49] W V Stoecker C-S Chiang and R H Moss ldquoTexture in skinimages comparison of three methods to determine smooth-nessrdquo Computerized Medical Imaging and Graphics vol 16 no3 pp 179ndash190 1992

[50] S V Deshabhoina S E Umbaugh W V Stoecker R H Mossand S K Srinivasan ldquoMelanoma and seborrheic keratosisdifferentiation using texture featuresrdquo Skin Research and Tech-nology vol 9 no 4 pp 348ndash356 2003

[51] G M Weiss ldquoMining with rarity a unifying frameworkrdquo ACMSIGKDD Explorations Newsletter vol 6 no 1 pp 7ndash19 2004

[52] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

[53] H Liu and L Yu ldquoToward integrating feature selection algo-rithms for classification and clusteringrdquo IEEE Transactions onKnowledge and Data Engineering vol 17 no 4 pp 491ndash5022005

[54] I T Jolliffe Principal Component Analysis Springer New YorkNY USA 1986

[55] I Kononenko and E Simec ldquoInduction of decision treesusing RELIEFFrdquo in Proceedings of the ISSEK94 Workshop onMathematical and Statistical Methods in Artificial Intelligencevol 363 of International Centre forMechanical Sciences pp 199ndash220 Springer Vienna Austria 1995

[56] R Battiti ldquoUsing mutual information for selecting features insupervised neural net learningrdquo IEEE Transactions on NeuralNetworks vol 5 no 4 pp 537ndash550 1994

[57] M A Hall ldquoCorrelation-based feature selection for discreteand numeric class machine learningrdquo in Proceedings of the 17thInternational Conference on Machine Learning (ICML rsquo00) pp359ndash366 Morgan Kaufmann 2000

[58] E E Ghiselli Theory of Psychological Measurement McGraw-Hill New York NY USA 1964

[59] I H Witten and E Frank Data Mining Practical MachineLearning Tools and Techniques Morgan Kaufmann San Fran-cisco Calif USA 2005

[60] S Dreiseitl L Ohno-Machado H Kittler S Vinterbo HBillhardt and M Binder ldquoA comparison of machine learningmethods for the diagnosis of pigmented skin lesionsrdquo Journal ofBiomedical Informatics vol 34 no 1 pp 28ndash36 2001

[61] A Glowacz A Glowacz and P Korohoda ldquoRecognition ofmonochrome thermal images of synchronous motor with theapplication of binarization and nearestmean classifierrdquoArchivesof Metallurgy and Materials vol 59 no 1 pp 31ndash34 2014

[62] R Tadeusiewicz How Intelligent Should be System for ImageAnalysisvol 339 of Studies in Computational IntelligenceSpringer Berlin Germany 2011

[63] L Jost ldquoEntropy and diversityrdquo Oikos vol 113 no 2 pp 363ndash375 2006

[64] K-B Duan and S S Keerthi ldquoWhich is the bestmulticlass SVMmethod An empirical studyrdquo inMultiple Classifier Systems 6thInternational Workshop MCS 2005 Seaside CA USA June 13ndash15 2005 Proceedings vol 3541 of Lecture Notes in ComputerScience pp 278ndash285 Springer Berlin Germany 2005

[65] C Hsu and C Lin ldquoA comparison of methods for multi-class support vector machinesrdquo IEEE Transactions on NeuralNetworks vol 13 no 2 pp 415ndash425 2002

[66] A A Safi Towards computer-aided diagnosis of pigmented skinlesionson [PhD thesis] Technische Universitat Munchen 2012

[67] C-W Hsu C-C Chang and C-J Lin ldquoA practical guide tosupport vector classificationrdquo Tech Rep National Taiwan Uni-versity 2003 httpwwwcsientuedutwsimcjlinpapersguideguidepdf

[68] A M Molinaro R Simon and R M Pfeiffer ldquoPrediction errorestimation a comparison of resamplingmethodsrdquo Bioinformat-ics vol 21 no 15 pp 3301ndash3307 2005

[69] R Bouckaert ldquoChoosing between two learning algorithmsbased on calibrated testsrdquo in Proceedings of the 20th Interna-tional Conference on Machine Learning (ICML rsquo03) T Fawcettand N Mishra Eds pp 51ndash58 AAAI Press Washington DCUSA 2003

[70] P Flach Machine Learning The Art and Science of AlgorithmsThat Make Sense of Data Cambridge University Press NewYork NY USA 2012

[71] P Rubegni G Cevenini M Burroni et al ldquoAutomated diagno-sis of pigmented skin lesionsrdquo International Journal of Cancervol 101 no 6 pp 576ndash580 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 10: Research Article Automatic Classification of Specific ...downloads.hindawi.com/journals/bmri/2016/8934242.pdfmost popular melanocytic nevi to the diagnostic categories including Clark

10 BioMed Research International

RP =119899119903

119899

LGRE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

119894119895

1198942

HGRE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198942

119894119895

SRLGE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

119894119895

1198942 sdot 1198952

SRHGE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198942

119894119895

1198952

LRLGE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198952

119894119895

1198942

LRHGE =1

119899119903

119866

sum

119894=1

119877

sum

119895=1

1198942

1198952

119894119895

(12)

where 119899 is the number of image pixels with 119866 gray levels 119877 isthe maximal run length and 119899

119903is the total number of runs in

an image (119899119903= sum119866

119894=1sum119877

119895=1119894119895)

26 Asymmetry The observed asymmetry is a significantdiagnostic premise as in malignant lesions the arrangementof local structures (eg dots streaks and pigmentation nets)is nonuniform across the whole area of a lesion [32] Someresearchers point out that for instance sharp transitionsbetween central and border area indicatemalignancy [24 33]

Asymmetry measures may be defined in terms of ratiosof feature values across various lesionrsquos area segments or interms of changes of feature values between halves obtainedby splitting lesion area into halves using straight section linespassing through the center of mass [32 33]

In our study a different approach has been adopted Thelesion area has been first divided into sets of subregions inthree differentmanners (Figure 11) by splitting (1) into centraland border part (2) into halves along minor and major axisand (3) into quarters using same axes Then a variance offeature values has been calculated for each set of subregionsAdditionally asymmetry indices proposed by Stoecker et al[34] and the following quantities have been computed

119877119876119894

=119891119876119894

119891119882

(13)

where 119891119876119894

denotes value of feature 119891 calculated for 119894thquadrant and 119891

119882for the whole area respectively

27 Data Preparation In order to apply correlated-basefeature selection method (the choice of this particularmethod is discussed later) numerical features have beendiscretizedThis is due to the fact that in the studied problem

the decision feature is a categorical variable which meansit takes on discrete values and the relationship betweenvariables of those two classes cannot be directly assessedTheonly solution is to discretize the variable taking continuousvalues In general terms discretization is a simple logicalcondition which takes into account one or a few attributesand which aims at splitting a range of data into at least twosubsets Data in our study have been discretized accordingto the method proposed by Fayyad and Irani [35] whichuses heuristics minimizing entropy based on the ldquominimumdescription length principlerdquo

To ensure correct work of algorithms based on analysis ofdistances between points in the feature space (eg 119896-nearestneighbors algorithm SVM) for each feature its values havebeen scaled to the range of [minus1 1] Consequently the riskof a situation in which a feature with larger range of valuesdominates other features has been eliminated Normallydistributed features have been transformed using 119885-score(Equation (14)) whereas other features were rescaled linearlyto the desired range (15) [36] Consider

x =x minus x3119904

where x =1

119873sum

119894

x119894 119904 =

1

119873 minus 1(sum

119894

(x119894minus 120583)2

)

(14)

x = 2x119894minus xmin

xmax minus xminminus 1 (15)

where x is a vector of featurersquos values prior to normalizationand x is the same vector after normalization To determineif a given feature is well modeled by a normal distributionchi-square goodness-of-fit test has been applied

28 Class Imbalance Problem The data set used in thisstudy exhibits class imbalance problem which means thatthe classes are not approximately equally represented (ie itconsists of more than three times as many images of Clarknevus than images of blue nevus) In such case most clas-sifiers will focus on learning how to identify representativesof majority classes leading to poor predictive accuracy forminority classes When making a diagnosis such a situationis unacceptable as misclassification error may lead even topatientrsquos death

The most common way to address this issue is sampling[51] There exist two main types of sampling methodsundersampling which consists in removing representativesof the majority class and oversampling which consists inadding examples of the minority class Out of many samplingtechniques two methods are particularly popular randomundersampling and synthetic minority oversampling technique(SMOTE) In random undersampling method randomlydrawn representatives of the majority class are removed fromthe data set In SMOTE method new ldquosyntheticrdquo examplesare created by averaging a few representatives of the sameminority classes which are nearest neighbors in the featurespace [52]

BioMed Research International 11

(a) Whole area

(b) Inner and outer area

(c) Split along the major axis (d) Split along the minor axis

(e) Split along both major and minor axis simultaneously

Figure 11 A sample division of an object into subregions

In this study SMOTE method has been used as it hasbeen proven to be more effective than random undersam-pling when tested on a set of dermoscopic images obtainedfrom various sources without any common or rare lesiontypes omitted [24]

29 Feature Selection Feature selection consists in reducingdata dimensionality by rejecting redundant unimportant ornoisy features thus resulting in increased prediction accu-racy less complex classifier models and better computationefficiency

Feature selection algorithms may be divided into threemain categories filters wrappers and projections [53 54] Inthis study filter methods have been used for two reasons[24] Firstly as filters are usually fast it is possible to test

a huge number of combinations of features Secondly if onewould like to use wrappers on a given data set the targetlearning algorithm should demonstrate satisfactory resultsfor the original data set as wrappers are based on feedbackprinciple As some features extracted in this study might beirrelevant or redundant as well as due to class imbalancewrappers have not been likely to fulfill those restrictionsOn the other hand projections are used mainly to increasecomputational performance which was not the case in thisstudy

Out of numerous available filters three areworth noticingdue to their satisfactory performance on various data sets[53] ReliefFmutual information-based feature selection andcorrelation-based feature selection [55ndash57] As numerous fea-tures extracted in this study are strongly mutually correlated

12 BioMed Research International

correlation-based feature selection has been chosen as theonly method which takes into account not only relationshipsbetween features and the decision class but also relationshipsbetween features themselves

The heuristic ldquomeritrdquo Merit119878of a feature subset 119878 contain-

ing 119896 features is given by [58]

Merit119878=

119896119903cf

radic119896 + 119896 (119896 minus 1) 119903ff (16)

where 119903cf is the average feature-class correlation and 119903ff theaverage feature-feature intercorrelation

To compare two feature vectors of discrete values thesymmetric uncertainty an improved version of informationgain measure has been used [59] Symmetric uncertainty isgiven by

SU (119883 119884) = 2 [119867 (119883) + 119867 (119884) minus 119867 (119883 119884)

119867 (119884) + 119867 (119883)] (17)

where 119867(119860) is the marginal entropy of set 119860 and 119867(119860 119861) isthe joint entropy of sets 119860 and 119861

The number of selected features is a parameter whichrequires a careful tuning too few features results may pre-vent classifiers from distinguishing between various classeswhereas too many features impose risk of overfittingmdashasituation when a model excels in classifying training databut fails to generalize knowledge and hence misclassifies newsamples As some researchers suggest limiting the number offeatures 119896 to the range of 5 le 119896 le 30 [24] in this study 119896 = 20

has been adoptedThe applied procedure allowed choosing features which

best discriminate lesion types concentricity computed using119896-means algorithm concentricity computed using kernel 119896-means algorithm for 120590 = 1 119871lowast119886lowast119887lowast histogram distancesin 1198711and 119871

2metrics computed for central and border

part variances from sets of 119871lowast

119886lowast

119887lowast histogram distances

between the whole region and individual quarters in both1198711and 119871

2metrics variances from sets of 119871lowast119886lowast119887lowast histogram

distances between each pair of quarters in both 1198711and 119871

2

metrics variances from sets of GLCM features (dissimilar-ity maximum probability entropy energy correlation andcontrast) computed for quarters variance of variances of119886lowast channel values (in 119871

lowast

119886lowast

119887lowast color space) calculated for

quarters variance of variances of H channel values (in HSVcolor space) calculated for quarters mean value of variancesof 119886lowast channel values (in 119871

lowast

119886lowast

119887lowast color space) for quarters

variance from sets of 119871lowast119886lowast119887lowast histogram distances betweenhalves in both 119871

1and 119871

2metrics and variance from sets of

GLCM dissimilarity computed for halvesThis particular choice of features does not diverge from

the clinical practice according to which asymmetry is one ofthe most important premises in determining the lesion type

210 Classification

2101 Models The following predictive models have beentested [60 61] 119896-nearest neighbors algorithm (for 119896 isin

3 5 10 15) logistic regression decision tree and supportvector machine (SVM) Other models like neural networks

or rule-based systems have not been a subject of this study[62] In case of the decision tree Gini-Simpson index hasbeen used as a measure of data diversity [63] Decisionboundary of SVMs has been determined using radial basisfunction as kernels Radial basis function has been preferredover linear sigmoid and polynomial kernels as it exhibits afew important properties [24] it allows classifying nonlin-early separable data sets (as is the case in this study) it ischaracterized by high numerical stability and it is definedusing only one parameter Moreover as in this study there areas many as four decision classes the multiclass classificationproblem has been decomposed into a (greater) number ofbinary classification problems Winner-takes-all strategy hasbeen applied to combine solutions of subproblems into afinal solution as for small data sets its results are similarto the results of the best strategymdashpairwise couplingmdashwhileat the same time its computational burden is much lower[64 65]

Individual classifiers have been trained using typicalparameters It has been assumed that the cost of misclassify-ing melanoma as nevi is four times higher than another sortof misclassification Initial experimental results have provenSVM to be the most effective classifier (similar observationshave been made by other research teams [60 66]) Con-sequently further experiments were focused on improvingperformance of SVMs by fine-tuning their parameters

SVMwith radial basis function kernel is governed by twoparametersmdashmisclassification cost 119862 and kernel width 120574The task is to select those parameters in such a way that theaccuracy of predictions for new data (data not used in thelearning process) is maximal

As values of only two parameters have had to be adjustedgrid search has been applied [67] For each parametervalues from exponential series have been considered 119862 isin

2minus5

2minus3

215

and 120574 isin 2minus15

2minus13

23

Each com-bination of parameter values (119862

0 1205740) has been assessed

using a validation procedure described below After the gridsearch had finished SVM have been trained using optimalparameter values (119862lowast 120574lowast)

2102 Validation The aim of the validation step is to assessclassifierrsquos quality based on the number of generalizationerrors that is misclassified samples Optimal SVM modelshave been validated using stratified Monte Carlo cross vali-dation method and in each iteration the test set consisted of10 of samples

The Monte Carlo variant of cross validation had beenchosen for a number of reasons Firstly experimental resultsobtained for a similar problem [24] suggest high effectivenessof this method Secondly in their analysis Molinaro et al[68] point out that for datasets consisting of few sampleswith many features such as a dataset used in this studythe difference in effectiveness between Monte Carlo andldquoclassicrdquo 10-fold cross validation is insignificant The numberof samples drawn into the test set has been determined usinga ldquorule of thumbrdquo [69] Finally by applying Monte Carlocross validation one may avoid constructing overdevel-oped predictive models which decreases the risk of overfit-ting

BioMed Research International 13

As in the data set used in this study there is a considerabledifference in the number of representatives between variousdecision classes and stratification has been applied Stratifica-tion ensures the similar distribution of representatives of eachdecision class in both training set and test set in each iterationof the validation procedure

3 Results and Discussion

31 Database Specification The described algorithm for theautomatic classification of specific melanocytic lesions hasbeen tested on dermoscopic images from a widely usedInteractive Atlas of Dermoscopy [2] Images for this atlashave been provided by two university hospitals (Universityof Naples Italy and University of Graz Austria) and storedon a CD-ROM in the JPEG format The documentation ofeach dermoscopic image was performed using a Dermaphotapparatus (Heine Optotechnik Herrsching Germany) anda photo camera (Nikon F3) mounted on a stereomicroscope(Wild M650 Heerbrugg AG Switzerland) in order to pro-duce digitized ELM images of skin lesions All the imageshave been assessed manually by a dermoscopic expert withan extensive clinical experience

Furthermore all the descriptions of skin cases were basedon the histopathological examination of the biopsy materialIn order to develop and test the automatic procedure for theclassification of melanocytic skin lesions 300 images withdifferent resolutions ranging from 0033 to 05mmpixelwere chosen The database included 100 Clark nevus cases70 blue nevus cases 70 Spitz nevus cases and 60 malignantmelanomas

The preprocessing step (black frame removal and hairremoval) as well as the segmentation step (border error lessthan 6) did not affect the further research [14]

32 Statistical Analysis The performance of a classifier canbe assessed based on the analysis of discrepancies in classifi-cation that is differences between the classification carriedout by the classifier and the actual classification (groundtruth) summarized in a confusion matrix In this matrixeach row refers to actual classes 119888(119909) as recorded in the testset and each column refers to classes as predicted by theclassifier (119909) The (119894 119895)th element contains the number oftest instances predicted by a classifier to belong to 119895th classclass whereas they are actually representatives of 119894th class

From a contingency table we can calculate three perfor-mance indicators accuracy true positive rate (TPR) and truenegative rate (TNR)

For a binary classification problem those indicators aregiven by [70]

Accuracy = 1

Tesum

119909isinTe119868 [ (119909) = 119888 (119909)]

TPR =sum119909isinTe 119868 [ (119909) = 119888 (119909) = oplus]

sum119909isinTe 119868 [119888 (119909) = oplus]

TNR =sum119909isinTe 119868 [ (119909) = 119888 (119909) = ⊖]

sum119909isinTe 119868 [119888 (119909) = ⊖]

(18)

where Te is the test set and the function 119868[sdot] denotes theindicator function Positive and negative classes are denotedby oplus and ⊖ respectively

Although the initial problem is a multiclass classificationproblem (with four classes) it can be decomposed into fourbinary classification problems whether an instance belongsto the given class or not

When dealing with a data set exhibiting class imbalanceproblem the accuracy should be considered only a prelim-inary performance indicator For instance for a data setwith class ratio 99 1 a classifier which simply counts eachinstance as a representative of the majority class would scorethe accuracy of 99 Therefore in such cases better measurewould be a plot of the ROC curve and the area under thatcurve

The ROC plot completely visualizes the (normalized)contingency table by means of plot in the unit squarewith TPR rate on the 119910-axis and TNR rate on the 119909-axisEach of the points marked on the ROC plot specifies theclassification performance in terms of true and false positiverates achieved by the corresponding score thresholds In ourstudy posterior probabilities have been used as scores Thosepoints are connected by straight lines producing a piecewiselinear curve the ROC curve that rises monotonically from(0 0) to (1 1) The ROC plot can be interpreted as a plotof costs-to-benefits ratio To measure the performance of aclassifier using a ROC plot the area under that plot curve(AUC) is calculated

Values of aforementioned performance indicators forall examined classifiers have been summarized in Table 2Additionally Figure 12 presents ROC plots for a malignant-or-benign classification

Among all examined classifiers SVM achieved definitelybest results It achieved highest overall accuracy ACCALL =

09296 whereas for the second-best classifier the logisticregression ACCALL = 08028 The SVM outperformed thelogistic regression in classifying all four lesion types It isto be stressed that the SVM turned out to be the mosteffective classifier in recognizing malignant melanoma themost dangerous of all four lesions Its true positive rate andarea under the curve amounted to TPRMM = 08611 andAUCMM = 09688 respectively A little bit lower true negativerate for the SVM than for the decision tree is not a problemin the studied area It indicates misclassification of Clarkor Spitz nevi as malignant melanoma and the cost of sucha misdiagnosis is much much lower than of classifying amalignant tumor as a benign one

The achieved results are slightly better thanmost results ofmalignant-or-benign studies conducted so far Our methodallowed classifying malignant lesions with TPRMM = 08611

and TNRMM = 09623 whereas for other similar studies TPRfalls within the range 08ndash10 andTNRndash05ndash095 respectively[66 71]

A truly reliable diagnosis can be made only by ahistopathological examination of a lesion When in theirstudies Curley et al asked three experienced dermatologiststo classify lesions based only on the visual examination thosephysicians identified correctly only half of cases Therefore

14 BioMed Research International

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(a) Logit (AUC = 09469)

0

02

04

06

08

1

TPR

02 04 06 08 101 minus TNR

(b) Tree (AUC = 09555)

0

02

04

06

08

1

TPR

104 06 080201 minus TNR

(c) 3NN (AUC = 09152)

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(d) 5NN (AUC = 08746)

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(e) 10NN (AUC = 07860)

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(f) 15NN (AUC = 07378)

02 04 06 080 11 minus TNR

0

02

04

06

08

1

TPR

(g) SVM (AUC = 09688)

Figure 12 ROC plots for individual classifiers for a malignant-or-benign classification

BioMed Research International 15

Table 2 The assessment of classifiers for a 1-vs-all and overall classification

Measure Logit Tree 3NN 5NN 10NN 15NN SVMTPRBN 09730 08485 07200 07660 06939 06735 10000TNRBN 10000 09266 10000 10000 09785 09677 10000AUCBN 10000 09675 09914 09889 09679 09659 10000TPRCN 08611 06977 08519 07586 08148 08182 09143TNRCN 09528 09394 08870 08761 08783 08500 09626AUCCN 09817 09279 09615 09089 09099 09030 09851TPRMM 08387 09655 07419 06786 06071 05185 08611TNRMM 09189 09381 08919 08596 08421 08174 09623AUCMM 09469 09555 09152 08746 07860 07378 09688TPRSN 08421 07568 08235 07105 06053 05227 09429TNRSN 09712 09333 09352 09231 08846 08776 09813AUCSN 09648 09425 09606 09395 08812 08545 09789ACCALL 08803 08028 07746 07324 06761 06197 09296BN blue nevus CN Clark nevus MM malignant melanoma SN Spitz nevus ALL multiclass classification

the presented method of classification yields better resultsthan those obtained during a medical examination

4 Conclusions

This paper presents a computer-aided approach for theautomatic classification of melanocytic lesions and accuratediagnosis of melanoma In our research we evaluated ourapproach with four types of classifiers 119896-nearest neighborsalgorithm logistic regression decision tree and SVM Thedeveloped system achieved 92 accuracy with SVM

Our study gives an important contribution to the researcharea of skin lesion classification for several reasons Firstly itfocuses on specifying the exact type of a skin lesion and notonly on categorization of skin lesions as benign or malignantSecondly this work combines the results of research doneso far related to all the steps needed for the developmentof an automatic diagnostic system for melanocytic lesiondetection and classification Finally the refinement of currentapproaches and development of new techniques andmethodswill help to improve the ability to diagnose skin moles moreprecisely and to achieve the goal of the significant reductionin melanoma mortality rate

41 Future Work Starting from the present frameworkfurther research efforts will be firstly addressed to compareand integrate the very promising approaches and corre-sponding feature descriptors reported in the most recentliterature in order to improve the classification accuracy ofmelanocytic lesions We will also conduct a follow-up studyby collecting more real data especially melanoma cases tofurther evaluate our approach One of the major problems inthe field of melanocytic skin tumors is the underdiagnosisof melanoma as a benign melanocytic or nonmelanocyticlesion To decrease the amount of dermoscopic pitfalls thenumber of melanocytic lesion types will be extended for abetter evaluation of melanocytic lesion

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This scientific work was partly supported by the National Sci-ence Centre based on the Decision no 201101NST706783

References

[1] Skin September 2015 httpsciencenationalgeographiccomsciencehealth-and-human-bodyhuman-bodyskin-article

[2] G Argenziano and H P Soyer Interactive Atlas of DermoscopyBook and CDWeb Resource Edra Medical Publishing and NewMedia Milan Itlay 2000

[3] K Korotkov and R Garcia ldquoComputerized analysis of pig-mented skin lesions a reviewrdquoArtificial Intelligence inMedicinevol 56 no 2 pp 69ndash90 2012

[4] P Soyer G Argenziano R Hofmann-Wellenhof and RH JohrColor Atlas of Melanocytic Lesions of the Skin Springer NewYork NY USA 2007

[5] Melanoma Foundation of New Zealand About melanomakey information 2014 httpwwwmelanomanetworkconzAbout-MelanomaKey-Information

[6] E Stocketh T Rosen and S Shumack Managing Skin CancerSpringer 2010

[7] S W Menzies L Bischof H Talbot et al ldquoThe performanceof SolarScan an automated dermoscopy image analysis instru-ment for the diagnosis of primary melanomardquo Archives ofDermatology vol 141 no 11 pp 1388ndash1396 2005

[8] M Burroni R CoronaGDellrsquoEva et al ldquoMelanoma computer-aided diagnosis reliability and feasibility studyrdquoClinical CancerResearch vol 10 no 6 pp 1881ndash1886 2004

[9] A Masood and A A Al-Jumaily ldquoComputer aided diagnosticsupport system for skin cancer a review of techniques andalgorithmsrdquo International Journal of Biomedical Imaging vol2013 Article ID 323268 22 pages 2013

16 BioMed Research International

[10] M E Celebi W V Stoecker and R H Moss ldquoAdvances inskin cancer image analysisrdquo ComputerizedMedical Imaging andGraphics vol 35 no 2 pp 83ndash84 2011

[11] H Zhou M Chen R Gass et al ldquoFeature-preserving artifactremoval from dermoscopy imagesrdquo in Medical Imaging ImageProcessing vol 6914 of Proceedings of SPIE pp 1ndash9 San DiegoCalif USA February 2008

[12] P Wighton T K Lee and M S Atkins ldquoDermascopic hairdisocclusion using inpaintingrdquo in Medical Imaging Image Pro-cessing vol 6914 ofProceedings of SPIE pp 1ndash8 SanDiego CalifUSA February 2008

[13] Q Abbas I F Garcia M Emre Celebi and W Ahmad ldquoAfeature-preserving hair removal algorithm for dermoscopyimagesrdquo Skin Research and Technology vol 19 no 1 pp e27ndashe36 2013

[14] J Jaworek-Korjakowska ldquoNovel method for border irregularityassessment in dermoscopic color imagesrdquo Computational andMathematicalMethods inMedicine vol 2015 Article ID 49620211 pages 2015

[15] J Jaworek-Korjakowska and R Tadeusiewicz ldquoHair removalfrom dermoscopic color imagesrdquo Bio-Algorithms and Med-Systems vol 9 no 2 pp 53ndash58 2013

[16] M E Celebi H Iyatomi G Schaefer and W V StoeckerldquoLesion border detection in dermoscopy imagesrdquoComputerizedMedical Imaging and Graphics vol 33 no 2 pp 148ndash153 2009

[17] J Jaworek-Korjakowska Analiza i detekcja struktur lokalnych wczerniaku złosliwym [Autoreferat Rozprawy Doktorskiej] AGHUniversity of Science and Technology Krakow Poland 2013

[18] E Zagrouba and W Barhoumi ldquoA prelimary approach for theautomated recognition ofmalignantmelanomardquo ImageAnalysisamp Stereology vol 23 no 2 pp 121ndash135 2004

[19] D Ballardand and C Brown Computer Vision Prentice HallProfessional Technical Reference 1st edition 1982

[20] R M Haralick ldquoA measure for circularity of digital figuresrdquoIEEE Transactions on Systems Man and Cybernetics vol 4 no4 pp 394ndash396 1974

[21] R S Montero and E Bribiesca ldquoState of the art of compactnessand circularity measuresrdquo International Mathematical Forumvol 4 no 27 pp 1305ndash1335 2009

[22] J Smietanski R Tadeusiewicz and E Łuczynska ldquoTextureanalysis in perfusion images of prostate cancermdasha case studyrdquoInternational Journal of Applied Mathematics and ComputerScience vol 20 no 1 pp 149ndash156 2010

[23] J W Choi Y W Park S Y Byun and S W Youn ldquoDifferentia-tion of benign pigmented skin lesions with the aid of computerimage analysis a novel approachrdquo Annals of Dermatology vol25 no 3 pp 340ndash347 2013

[24] M E Celebi H A Kingravi B Uddin et al ldquoA methodologicalapproach to the classification of dermoscopy imagesrdquo Comput-erized Medical Imaging and Graphics vol 31 no 6 pp 362ndash3732007

[25] C Manning P Raghavan and H Schutze Introduction toInformation Retrieval Cambridge University Press 2008

[26] J Shawe-Taylor and N Cristianini Kernel Methods for PatternAnalysis Cambridge University Press Cambridge UK 2004

[27] L Kaufman and P Rousseeuw Finding Groups in Data AnIntroduction to Cluster Analysis Wiley Hoboken NJ USA1990

[28] R M Haralick K Shanmugam and I Dinstein ldquoTexturalfeatures for image classificationrdquo IEEE Transactions on SystemsMan and Cybernetics vol 3 no 6 pp 610ndash621 1973

[29] MM Galloway ldquoTexture analysis using gray level run lengthsrdquoComputer Graphics and Image Processing vol 4 no 2 pp 172ndash179 1975

[30] A Chu C M Sehgal and J F Greenleaf ldquoUse of gray valuedistribution of run lengths for texture analysisrdquo Pattern Recog-nition Letters vol 11 no 6 pp 415ndash419 1990

[31] B V Dasarathy and E B Holder ldquoImage characterizationsbased on joint gray level-run length distributionsrdquo PatternRecognition Letters vol 12 no 8 pp 497ndash502 1991

[32] H Ganster A Pinz R Rohrer E Wildling M Binder and HKittler ldquoAutomated melanoma recognitionrdquo IEEE Transactionson Medical Imaging vol 20 no 3 pp 233ndash239 2001

[33] M DrsquoAmico M Ferri and I Stanganelli ldquoQualitative asym-metry measure for melanoma detectionrdquo in Proceedings of theIEEE International Symposium on Biomedical Imaging Nano toMacro vol 2 pp 1155ndash1158 Arlington Va USA April 2004

[34] W V Stoecker W W Li and R H Moss ldquoAutomatic detectionof asymmetry in skin tumorsrdquo Computerized Medical Imagingand Graphics vol 16 no 3 pp 191ndash197 1992

[35] U Fayyad and K Irani ldquoMulti-interval discretization ofcontinuous-valued attributes for classification learningrdquo inProceedings of the 13th International Joint Conference on ArticialIntelligence vol 2 pp 1022ndash1029 Morgan Kaufmann 1993

[36] S Aksoy and R M Haralick ldquoFeature normalization andlikelihood-based similarity measures for image retrievalrdquo Pat-tern Recognition Letters vol 22 no 5 pp 563ndash582 2001

[37] V Ng and D Cheung ldquoMeasuring asymmetries of skin lesionsrdquoin Proceedings of IEEE International Conference on SystemsMan and Cybernetics vol 5 pp 4211ndash4216 IEEE PressOrlando Fla USA October 1997

[38] Z She Y Liu and A Damatoa ldquoCombination of features fromskin pattern and ABCD analysis for lesion classificationrdquo SkinResearch and Technology vol 13 no 1 pp 25ndash33 2007

[39] B Kusumoputro and A Ariyanto ldquoNeural network diagnosisof malignant skin cancers using principal component analysisas a preprocessorrdquo in Proceedings of the IEEE International JointConference on Neural Networks and the IEEE World Congresson Computational Intelligence vol 1 pp 310ndash315 IEEE PressAnchorage Alaska USA May 1998

[40] E Claridge P N Hall M Keefe and J P Allen ldquoShape analysisfor classification ofmalignantmelanomardquo Journal of BiomedicalEngineering vol 14 no 3 pp 229ndash234 1992

[41] Y Cheng R Swamisai S E Umbaugh et al ldquoSkin lesionclassification using relative color featuresrdquo Skin Research andTechnology vol 14 no 1 pp 53ndash64 2008

[42] K Cheung Image processing for skin cancer detection malignantmelanoma recognition [PhD thesis] University of Toronto 1997

[43] T K Lee D I McLean and M S Atkins ldquoIrregularity indexa new border irregularity measure for cutaneous melanocyticlesionsrdquoMedical Image Analysis vol 7 no 1 pp 47ndash64 2003

[44] T K Lee and E Claridge ldquoPredictive power of irregular bordershapes for malignant melanomasrdquo Skin Research and Technol-ogy vol 11 no 1 pp 1ndash8 2005

[45] Y Chang R J Stanley R H Moss andW Van Stoecker ldquoA sys-tematic heuristic approach for feature selection for melanomadiscrimination using clinical imagesrdquo Skin Research and Tech-nology vol 11 no 3 pp 165ndash178 2005

[46] A J Round AW G Duller and P J Fish ldquoLesion classificationusing skin patterningrdquo Skin Research and Technology vol 6 no4 pp 183ndash192 2000

BioMed Research International 17

[47] Z She and P Fish ldquoAnalysis of skin line pattern for lesionclassificationrdquo Skin Research and Technology vol 9 no 1 pp73ndash80 2003

[48] R P Walvick K Patel S V Patwardhan and A P DhawanldquoClassification of melanoma using wavelet-transform-basedoptimal feature setrdquo in Proceedings of the Medical ImagingImage Processing vol 5370 of Proceedings of SPIE pp 944ndash951May 2004

[49] W V Stoecker C-S Chiang and R H Moss ldquoTexture in skinimages comparison of three methods to determine smooth-nessrdquo Computerized Medical Imaging and Graphics vol 16 no3 pp 179ndash190 1992

[50] S V Deshabhoina S E Umbaugh W V Stoecker R H Mossand S K Srinivasan ldquoMelanoma and seborrheic keratosisdifferentiation using texture featuresrdquo Skin Research and Tech-nology vol 9 no 4 pp 348ndash356 2003

[51] G M Weiss ldquoMining with rarity a unifying frameworkrdquo ACMSIGKDD Explorations Newsletter vol 6 no 1 pp 7ndash19 2004

[52] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

[53] H Liu and L Yu ldquoToward integrating feature selection algo-rithms for classification and clusteringrdquo IEEE Transactions onKnowledge and Data Engineering vol 17 no 4 pp 491ndash5022005

[54] I T Jolliffe Principal Component Analysis Springer New YorkNY USA 1986

[55] I Kononenko and E Simec ldquoInduction of decision treesusing RELIEFFrdquo in Proceedings of the ISSEK94 Workshop onMathematical and Statistical Methods in Artificial Intelligencevol 363 of International Centre forMechanical Sciences pp 199ndash220 Springer Vienna Austria 1995

[56] R Battiti ldquoUsing mutual information for selecting features insupervised neural net learningrdquo IEEE Transactions on NeuralNetworks vol 5 no 4 pp 537ndash550 1994

[57] M A Hall ldquoCorrelation-based feature selection for discreteand numeric class machine learningrdquo in Proceedings of the 17thInternational Conference on Machine Learning (ICML rsquo00) pp359ndash366 Morgan Kaufmann 2000

[58] E E Ghiselli Theory of Psychological Measurement McGraw-Hill New York NY USA 1964

[59] I H Witten and E Frank Data Mining Practical MachineLearning Tools and Techniques Morgan Kaufmann San Fran-cisco Calif USA 2005

[60] S Dreiseitl L Ohno-Machado H Kittler S Vinterbo HBillhardt and M Binder ldquoA comparison of machine learningmethods for the diagnosis of pigmented skin lesionsrdquo Journal ofBiomedical Informatics vol 34 no 1 pp 28ndash36 2001

[61] A Glowacz A Glowacz and P Korohoda ldquoRecognition ofmonochrome thermal images of synchronous motor with theapplication of binarization and nearestmean classifierrdquoArchivesof Metallurgy and Materials vol 59 no 1 pp 31ndash34 2014

[62] R Tadeusiewicz How Intelligent Should be System for ImageAnalysisvol 339 of Studies in Computational IntelligenceSpringer Berlin Germany 2011

[63] L Jost ldquoEntropy and diversityrdquo Oikos vol 113 no 2 pp 363ndash375 2006

[64] K-B Duan and S S Keerthi ldquoWhich is the bestmulticlass SVMmethod An empirical studyrdquo inMultiple Classifier Systems 6thInternational Workshop MCS 2005 Seaside CA USA June 13ndash15 2005 Proceedings vol 3541 of Lecture Notes in ComputerScience pp 278ndash285 Springer Berlin Germany 2005

[65] C Hsu and C Lin ldquoA comparison of methods for multi-class support vector machinesrdquo IEEE Transactions on NeuralNetworks vol 13 no 2 pp 415ndash425 2002

[66] A A Safi Towards computer-aided diagnosis of pigmented skinlesionson [PhD thesis] Technische Universitat Munchen 2012

[67] C-W Hsu C-C Chang and C-J Lin ldquoA practical guide tosupport vector classificationrdquo Tech Rep National Taiwan Uni-versity 2003 httpwwwcsientuedutwsimcjlinpapersguideguidepdf

[68] A M Molinaro R Simon and R M Pfeiffer ldquoPrediction errorestimation a comparison of resamplingmethodsrdquo Bioinformat-ics vol 21 no 15 pp 3301ndash3307 2005

[69] R Bouckaert ldquoChoosing between two learning algorithmsbased on calibrated testsrdquo in Proceedings of the 20th Interna-tional Conference on Machine Learning (ICML rsquo03) T Fawcettand N Mishra Eds pp 51ndash58 AAAI Press Washington DCUSA 2003

[70] P Flach Machine Learning The Art and Science of AlgorithmsThat Make Sense of Data Cambridge University Press NewYork NY USA 2012

[71] P Rubegni G Cevenini M Burroni et al ldquoAutomated diagno-sis of pigmented skin lesionsrdquo International Journal of Cancervol 101 no 6 pp 576ndash580 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 11: Research Article Automatic Classification of Specific ...downloads.hindawi.com/journals/bmri/2016/8934242.pdfmost popular melanocytic nevi to the diagnostic categories including Clark

BioMed Research International 11

(a) Whole area

(b) Inner and outer area

(c) Split along the major axis (d) Split along the minor axis

(e) Split along both major and minor axis simultaneously

Figure 11 A sample division of an object into subregions

In this study SMOTE method has been used as it hasbeen proven to be more effective than random undersam-pling when tested on a set of dermoscopic images obtainedfrom various sources without any common or rare lesiontypes omitted [24]

29 Feature Selection Feature selection consists in reducingdata dimensionality by rejecting redundant unimportant ornoisy features thus resulting in increased prediction accu-racy less complex classifier models and better computationefficiency

Feature selection algorithms may be divided into threemain categories filters wrappers and projections [53 54] Inthis study filter methods have been used for two reasons[24] Firstly as filters are usually fast it is possible to test

a huge number of combinations of features Secondly if onewould like to use wrappers on a given data set the targetlearning algorithm should demonstrate satisfactory resultsfor the original data set as wrappers are based on feedbackprinciple As some features extracted in this study might beirrelevant or redundant as well as due to class imbalancewrappers have not been likely to fulfill those restrictionsOn the other hand projections are used mainly to increasecomputational performance which was not the case in thisstudy

Out of numerous available filters three areworth noticingdue to their satisfactory performance on various data sets[53] ReliefFmutual information-based feature selection andcorrelation-based feature selection [55ndash57] As numerous fea-tures extracted in this study are strongly mutually correlated

12 BioMed Research International

correlation-based feature selection has been chosen as theonly method which takes into account not only relationshipsbetween features and the decision class but also relationshipsbetween features themselves

The heuristic ldquomeritrdquo Merit119878of a feature subset 119878 contain-

ing 119896 features is given by [58]

Merit119878=

119896119903cf

radic119896 + 119896 (119896 minus 1) 119903ff (16)

where 119903cf is the average feature-class correlation and 119903ff theaverage feature-feature intercorrelation

To compare two feature vectors of discrete values thesymmetric uncertainty an improved version of informationgain measure has been used [59] Symmetric uncertainty isgiven by

SU (119883 119884) = 2 [119867 (119883) + 119867 (119884) minus 119867 (119883 119884)

119867 (119884) + 119867 (119883)] (17)

where 119867(119860) is the marginal entropy of set 119860 and 119867(119860 119861) isthe joint entropy of sets 119860 and 119861

The number of selected features is a parameter whichrequires a careful tuning too few features results may pre-vent classifiers from distinguishing between various classeswhereas too many features impose risk of overfittingmdashasituation when a model excels in classifying training databut fails to generalize knowledge and hence misclassifies newsamples As some researchers suggest limiting the number offeatures 119896 to the range of 5 le 119896 le 30 [24] in this study 119896 = 20

has been adoptedThe applied procedure allowed choosing features which

best discriminate lesion types concentricity computed using119896-means algorithm concentricity computed using kernel 119896-means algorithm for 120590 = 1 119871lowast119886lowast119887lowast histogram distancesin 1198711and 119871

2metrics computed for central and border

part variances from sets of 119871lowast

119886lowast

119887lowast histogram distances

between the whole region and individual quarters in both1198711and 119871

2metrics variances from sets of 119871lowast119886lowast119887lowast histogram

distances between each pair of quarters in both 1198711and 119871

2

metrics variances from sets of GLCM features (dissimilar-ity maximum probability entropy energy correlation andcontrast) computed for quarters variance of variances of119886lowast channel values (in 119871

lowast

119886lowast

119887lowast color space) calculated for

quarters variance of variances of H channel values (in HSVcolor space) calculated for quarters mean value of variancesof 119886lowast channel values (in 119871

lowast

119886lowast

119887lowast color space) for quarters

variance from sets of 119871lowast119886lowast119887lowast histogram distances betweenhalves in both 119871

1and 119871

2metrics and variance from sets of

GLCM dissimilarity computed for halvesThis particular choice of features does not diverge from

the clinical practice according to which asymmetry is one ofthe most important premises in determining the lesion type

210 Classification

2101 Models The following predictive models have beentested [60 61] 119896-nearest neighbors algorithm (for 119896 isin

3 5 10 15) logistic regression decision tree and supportvector machine (SVM) Other models like neural networks

or rule-based systems have not been a subject of this study[62] In case of the decision tree Gini-Simpson index hasbeen used as a measure of data diversity [63] Decisionboundary of SVMs has been determined using radial basisfunction as kernels Radial basis function has been preferredover linear sigmoid and polynomial kernels as it exhibits afew important properties [24] it allows classifying nonlin-early separable data sets (as is the case in this study) it ischaracterized by high numerical stability and it is definedusing only one parameter Moreover as in this study there areas many as four decision classes the multiclass classificationproblem has been decomposed into a (greater) number ofbinary classification problems Winner-takes-all strategy hasbeen applied to combine solutions of subproblems into afinal solution as for small data sets its results are similarto the results of the best strategymdashpairwise couplingmdashwhileat the same time its computational burden is much lower[64 65]

Individual classifiers have been trained using typicalparameters It has been assumed that the cost of misclassify-ing melanoma as nevi is four times higher than another sortof misclassification Initial experimental results have provenSVM to be the most effective classifier (similar observationshave been made by other research teams [60 66]) Con-sequently further experiments were focused on improvingperformance of SVMs by fine-tuning their parameters

SVMwith radial basis function kernel is governed by twoparametersmdashmisclassification cost 119862 and kernel width 120574The task is to select those parameters in such a way that theaccuracy of predictions for new data (data not used in thelearning process) is maximal

As values of only two parameters have had to be adjustedgrid search has been applied [67] For each parametervalues from exponential series have been considered 119862 isin

2minus5

2minus3

215

and 120574 isin 2minus15

2minus13

23

Each com-bination of parameter values (119862

0 1205740) has been assessed

using a validation procedure described below After the gridsearch had finished SVM have been trained using optimalparameter values (119862lowast 120574lowast)

2102 Validation The aim of the validation step is to assessclassifierrsquos quality based on the number of generalizationerrors that is misclassified samples Optimal SVM modelshave been validated using stratified Monte Carlo cross vali-dation method and in each iteration the test set consisted of10 of samples

The Monte Carlo variant of cross validation had beenchosen for a number of reasons Firstly experimental resultsobtained for a similar problem [24] suggest high effectivenessof this method Secondly in their analysis Molinaro et al[68] point out that for datasets consisting of few sampleswith many features such as a dataset used in this studythe difference in effectiveness between Monte Carlo andldquoclassicrdquo 10-fold cross validation is insignificant The numberof samples drawn into the test set has been determined usinga ldquorule of thumbrdquo [69] Finally by applying Monte Carlocross validation one may avoid constructing overdevel-oped predictive models which decreases the risk of overfit-ting

BioMed Research International 13

As in the data set used in this study there is a considerabledifference in the number of representatives between variousdecision classes and stratification has been applied Stratifica-tion ensures the similar distribution of representatives of eachdecision class in both training set and test set in each iterationof the validation procedure

3 Results and Discussion

31 Database Specification The described algorithm for theautomatic classification of specific melanocytic lesions hasbeen tested on dermoscopic images from a widely usedInteractive Atlas of Dermoscopy [2] Images for this atlashave been provided by two university hospitals (Universityof Naples Italy and University of Graz Austria) and storedon a CD-ROM in the JPEG format The documentation ofeach dermoscopic image was performed using a Dermaphotapparatus (Heine Optotechnik Herrsching Germany) anda photo camera (Nikon F3) mounted on a stereomicroscope(Wild M650 Heerbrugg AG Switzerland) in order to pro-duce digitized ELM images of skin lesions All the imageshave been assessed manually by a dermoscopic expert withan extensive clinical experience

Furthermore all the descriptions of skin cases were basedon the histopathological examination of the biopsy materialIn order to develop and test the automatic procedure for theclassification of melanocytic skin lesions 300 images withdifferent resolutions ranging from 0033 to 05mmpixelwere chosen The database included 100 Clark nevus cases70 blue nevus cases 70 Spitz nevus cases and 60 malignantmelanomas

The preprocessing step (black frame removal and hairremoval) as well as the segmentation step (border error lessthan 6) did not affect the further research [14]

32 Statistical Analysis The performance of a classifier canbe assessed based on the analysis of discrepancies in classifi-cation that is differences between the classification carriedout by the classifier and the actual classification (groundtruth) summarized in a confusion matrix In this matrixeach row refers to actual classes 119888(119909) as recorded in the testset and each column refers to classes as predicted by theclassifier (119909) The (119894 119895)th element contains the number oftest instances predicted by a classifier to belong to 119895th classclass whereas they are actually representatives of 119894th class

From a contingency table we can calculate three perfor-mance indicators accuracy true positive rate (TPR) and truenegative rate (TNR)

For a binary classification problem those indicators aregiven by [70]

Accuracy = 1

Tesum

119909isinTe119868 [ (119909) = 119888 (119909)]

TPR =sum119909isinTe 119868 [ (119909) = 119888 (119909) = oplus]

sum119909isinTe 119868 [119888 (119909) = oplus]

TNR =sum119909isinTe 119868 [ (119909) = 119888 (119909) = ⊖]

sum119909isinTe 119868 [119888 (119909) = ⊖]

(18)

where Te is the test set and the function 119868[sdot] denotes theindicator function Positive and negative classes are denotedby oplus and ⊖ respectively

Although the initial problem is a multiclass classificationproblem (with four classes) it can be decomposed into fourbinary classification problems whether an instance belongsto the given class or not

When dealing with a data set exhibiting class imbalanceproblem the accuracy should be considered only a prelim-inary performance indicator For instance for a data setwith class ratio 99 1 a classifier which simply counts eachinstance as a representative of the majority class would scorethe accuracy of 99 Therefore in such cases better measurewould be a plot of the ROC curve and the area under thatcurve

The ROC plot completely visualizes the (normalized)contingency table by means of plot in the unit squarewith TPR rate on the 119910-axis and TNR rate on the 119909-axisEach of the points marked on the ROC plot specifies theclassification performance in terms of true and false positiverates achieved by the corresponding score thresholds In ourstudy posterior probabilities have been used as scores Thosepoints are connected by straight lines producing a piecewiselinear curve the ROC curve that rises monotonically from(0 0) to (1 1) The ROC plot can be interpreted as a plotof costs-to-benefits ratio To measure the performance of aclassifier using a ROC plot the area under that plot curve(AUC) is calculated

Values of aforementioned performance indicators forall examined classifiers have been summarized in Table 2Additionally Figure 12 presents ROC plots for a malignant-or-benign classification

Among all examined classifiers SVM achieved definitelybest results It achieved highest overall accuracy ACCALL =

09296 whereas for the second-best classifier the logisticregression ACCALL = 08028 The SVM outperformed thelogistic regression in classifying all four lesion types It isto be stressed that the SVM turned out to be the mosteffective classifier in recognizing malignant melanoma themost dangerous of all four lesions Its true positive rate andarea under the curve amounted to TPRMM = 08611 andAUCMM = 09688 respectively A little bit lower true negativerate for the SVM than for the decision tree is not a problemin the studied area It indicates misclassification of Clarkor Spitz nevi as malignant melanoma and the cost of sucha misdiagnosis is much much lower than of classifying amalignant tumor as a benign one

The achieved results are slightly better thanmost results ofmalignant-or-benign studies conducted so far Our methodallowed classifying malignant lesions with TPRMM = 08611

and TNRMM = 09623 whereas for other similar studies TPRfalls within the range 08ndash10 andTNRndash05ndash095 respectively[66 71]

A truly reliable diagnosis can be made only by ahistopathological examination of a lesion When in theirstudies Curley et al asked three experienced dermatologiststo classify lesions based only on the visual examination thosephysicians identified correctly only half of cases Therefore

14 BioMed Research International

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(a) Logit (AUC = 09469)

0

02

04

06

08

1

TPR

02 04 06 08 101 minus TNR

(b) Tree (AUC = 09555)

0

02

04

06

08

1

TPR

104 06 080201 minus TNR

(c) 3NN (AUC = 09152)

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(d) 5NN (AUC = 08746)

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(e) 10NN (AUC = 07860)

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(f) 15NN (AUC = 07378)

02 04 06 080 11 minus TNR

0

02

04

06

08

1

TPR

(g) SVM (AUC = 09688)

Figure 12 ROC plots for individual classifiers for a malignant-or-benign classification

BioMed Research International 15

Table 2 The assessment of classifiers for a 1-vs-all and overall classification

Measure Logit Tree 3NN 5NN 10NN 15NN SVMTPRBN 09730 08485 07200 07660 06939 06735 10000TNRBN 10000 09266 10000 10000 09785 09677 10000AUCBN 10000 09675 09914 09889 09679 09659 10000TPRCN 08611 06977 08519 07586 08148 08182 09143TNRCN 09528 09394 08870 08761 08783 08500 09626AUCCN 09817 09279 09615 09089 09099 09030 09851TPRMM 08387 09655 07419 06786 06071 05185 08611TNRMM 09189 09381 08919 08596 08421 08174 09623AUCMM 09469 09555 09152 08746 07860 07378 09688TPRSN 08421 07568 08235 07105 06053 05227 09429TNRSN 09712 09333 09352 09231 08846 08776 09813AUCSN 09648 09425 09606 09395 08812 08545 09789ACCALL 08803 08028 07746 07324 06761 06197 09296BN blue nevus CN Clark nevus MM malignant melanoma SN Spitz nevus ALL multiclass classification

the presented method of classification yields better resultsthan those obtained during a medical examination

4 Conclusions

This paper presents a computer-aided approach for theautomatic classification of melanocytic lesions and accuratediagnosis of melanoma In our research we evaluated ourapproach with four types of classifiers 119896-nearest neighborsalgorithm logistic regression decision tree and SVM Thedeveloped system achieved 92 accuracy with SVM

Our study gives an important contribution to the researcharea of skin lesion classification for several reasons Firstly itfocuses on specifying the exact type of a skin lesion and notonly on categorization of skin lesions as benign or malignantSecondly this work combines the results of research doneso far related to all the steps needed for the developmentof an automatic diagnostic system for melanocytic lesiondetection and classification Finally the refinement of currentapproaches and development of new techniques andmethodswill help to improve the ability to diagnose skin moles moreprecisely and to achieve the goal of the significant reductionin melanoma mortality rate

41 Future Work Starting from the present frameworkfurther research efforts will be firstly addressed to compareand integrate the very promising approaches and corre-sponding feature descriptors reported in the most recentliterature in order to improve the classification accuracy ofmelanocytic lesions We will also conduct a follow-up studyby collecting more real data especially melanoma cases tofurther evaluate our approach One of the major problems inthe field of melanocytic skin tumors is the underdiagnosisof melanoma as a benign melanocytic or nonmelanocyticlesion To decrease the amount of dermoscopic pitfalls thenumber of melanocytic lesion types will be extended for abetter evaluation of melanocytic lesion

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This scientific work was partly supported by the National Sci-ence Centre based on the Decision no 201101NST706783

References

[1] Skin September 2015 httpsciencenationalgeographiccomsciencehealth-and-human-bodyhuman-bodyskin-article

[2] G Argenziano and H P Soyer Interactive Atlas of DermoscopyBook and CDWeb Resource Edra Medical Publishing and NewMedia Milan Itlay 2000

[3] K Korotkov and R Garcia ldquoComputerized analysis of pig-mented skin lesions a reviewrdquoArtificial Intelligence inMedicinevol 56 no 2 pp 69ndash90 2012

[4] P Soyer G Argenziano R Hofmann-Wellenhof and RH JohrColor Atlas of Melanocytic Lesions of the Skin Springer NewYork NY USA 2007

[5] Melanoma Foundation of New Zealand About melanomakey information 2014 httpwwwmelanomanetworkconzAbout-MelanomaKey-Information

[6] E Stocketh T Rosen and S Shumack Managing Skin CancerSpringer 2010

[7] S W Menzies L Bischof H Talbot et al ldquoThe performanceof SolarScan an automated dermoscopy image analysis instru-ment for the diagnosis of primary melanomardquo Archives ofDermatology vol 141 no 11 pp 1388ndash1396 2005

[8] M Burroni R CoronaGDellrsquoEva et al ldquoMelanoma computer-aided diagnosis reliability and feasibility studyrdquoClinical CancerResearch vol 10 no 6 pp 1881ndash1886 2004

[9] A Masood and A A Al-Jumaily ldquoComputer aided diagnosticsupport system for skin cancer a review of techniques andalgorithmsrdquo International Journal of Biomedical Imaging vol2013 Article ID 323268 22 pages 2013

16 BioMed Research International

[10] M E Celebi W V Stoecker and R H Moss ldquoAdvances inskin cancer image analysisrdquo ComputerizedMedical Imaging andGraphics vol 35 no 2 pp 83ndash84 2011

[11] H Zhou M Chen R Gass et al ldquoFeature-preserving artifactremoval from dermoscopy imagesrdquo in Medical Imaging ImageProcessing vol 6914 of Proceedings of SPIE pp 1ndash9 San DiegoCalif USA February 2008

[12] P Wighton T K Lee and M S Atkins ldquoDermascopic hairdisocclusion using inpaintingrdquo in Medical Imaging Image Pro-cessing vol 6914 ofProceedings of SPIE pp 1ndash8 SanDiego CalifUSA February 2008

[13] Q Abbas I F Garcia M Emre Celebi and W Ahmad ldquoAfeature-preserving hair removal algorithm for dermoscopyimagesrdquo Skin Research and Technology vol 19 no 1 pp e27ndashe36 2013

[14] J Jaworek-Korjakowska ldquoNovel method for border irregularityassessment in dermoscopic color imagesrdquo Computational andMathematicalMethods inMedicine vol 2015 Article ID 49620211 pages 2015

[15] J Jaworek-Korjakowska and R Tadeusiewicz ldquoHair removalfrom dermoscopic color imagesrdquo Bio-Algorithms and Med-Systems vol 9 no 2 pp 53ndash58 2013

[16] M E Celebi H Iyatomi G Schaefer and W V StoeckerldquoLesion border detection in dermoscopy imagesrdquoComputerizedMedical Imaging and Graphics vol 33 no 2 pp 148ndash153 2009

[17] J Jaworek-Korjakowska Analiza i detekcja struktur lokalnych wczerniaku złosliwym [Autoreferat Rozprawy Doktorskiej] AGHUniversity of Science and Technology Krakow Poland 2013

[18] E Zagrouba and W Barhoumi ldquoA prelimary approach for theautomated recognition ofmalignantmelanomardquo ImageAnalysisamp Stereology vol 23 no 2 pp 121ndash135 2004

[19] D Ballardand and C Brown Computer Vision Prentice HallProfessional Technical Reference 1st edition 1982

[20] R M Haralick ldquoA measure for circularity of digital figuresrdquoIEEE Transactions on Systems Man and Cybernetics vol 4 no4 pp 394ndash396 1974

[21] R S Montero and E Bribiesca ldquoState of the art of compactnessand circularity measuresrdquo International Mathematical Forumvol 4 no 27 pp 1305ndash1335 2009

[22] J Smietanski R Tadeusiewicz and E Łuczynska ldquoTextureanalysis in perfusion images of prostate cancermdasha case studyrdquoInternational Journal of Applied Mathematics and ComputerScience vol 20 no 1 pp 149ndash156 2010

[23] J W Choi Y W Park S Y Byun and S W Youn ldquoDifferentia-tion of benign pigmented skin lesions with the aid of computerimage analysis a novel approachrdquo Annals of Dermatology vol25 no 3 pp 340ndash347 2013

[24] M E Celebi H A Kingravi B Uddin et al ldquoA methodologicalapproach to the classification of dermoscopy imagesrdquo Comput-erized Medical Imaging and Graphics vol 31 no 6 pp 362ndash3732007

[25] C Manning P Raghavan and H Schutze Introduction toInformation Retrieval Cambridge University Press 2008

[26] J Shawe-Taylor and N Cristianini Kernel Methods for PatternAnalysis Cambridge University Press Cambridge UK 2004

[27] L Kaufman and P Rousseeuw Finding Groups in Data AnIntroduction to Cluster Analysis Wiley Hoboken NJ USA1990

[28] R M Haralick K Shanmugam and I Dinstein ldquoTexturalfeatures for image classificationrdquo IEEE Transactions on SystemsMan and Cybernetics vol 3 no 6 pp 610ndash621 1973

[29] MM Galloway ldquoTexture analysis using gray level run lengthsrdquoComputer Graphics and Image Processing vol 4 no 2 pp 172ndash179 1975

[30] A Chu C M Sehgal and J F Greenleaf ldquoUse of gray valuedistribution of run lengths for texture analysisrdquo Pattern Recog-nition Letters vol 11 no 6 pp 415ndash419 1990

[31] B V Dasarathy and E B Holder ldquoImage characterizationsbased on joint gray level-run length distributionsrdquo PatternRecognition Letters vol 12 no 8 pp 497ndash502 1991

[32] H Ganster A Pinz R Rohrer E Wildling M Binder and HKittler ldquoAutomated melanoma recognitionrdquo IEEE Transactionson Medical Imaging vol 20 no 3 pp 233ndash239 2001

[33] M DrsquoAmico M Ferri and I Stanganelli ldquoQualitative asym-metry measure for melanoma detectionrdquo in Proceedings of theIEEE International Symposium on Biomedical Imaging Nano toMacro vol 2 pp 1155ndash1158 Arlington Va USA April 2004

[34] W V Stoecker W W Li and R H Moss ldquoAutomatic detectionof asymmetry in skin tumorsrdquo Computerized Medical Imagingand Graphics vol 16 no 3 pp 191ndash197 1992

[35] U Fayyad and K Irani ldquoMulti-interval discretization ofcontinuous-valued attributes for classification learningrdquo inProceedings of the 13th International Joint Conference on ArticialIntelligence vol 2 pp 1022ndash1029 Morgan Kaufmann 1993

[36] S Aksoy and R M Haralick ldquoFeature normalization andlikelihood-based similarity measures for image retrievalrdquo Pat-tern Recognition Letters vol 22 no 5 pp 563ndash582 2001

[37] V Ng and D Cheung ldquoMeasuring asymmetries of skin lesionsrdquoin Proceedings of IEEE International Conference on SystemsMan and Cybernetics vol 5 pp 4211ndash4216 IEEE PressOrlando Fla USA October 1997

[38] Z She Y Liu and A Damatoa ldquoCombination of features fromskin pattern and ABCD analysis for lesion classificationrdquo SkinResearch and Technology vol 13 no 1 pp 25ndash33 2007

[39] B Kusumoputro and A Ariyanto ldquoNeural network diagnosisof malignant skin cancers using principal component analysisas a preprocessorrdquo in Proceedings of the IEEE International JointConference on Neural Networks and the IEEE World Congresson Computational Intelligence vol 1 pp 310ndash315 IEEE PressAnchorage Alaska USA May 1998

[40] E Claridge P N Hall M Keefe and J P Allen ldquoShape analysisfor classification ofmalignantmelanomardquo Journal of BiomedicalEngineering vol 14 no 3 pp 229ndash234 1992

[41] Y Cheng R Swamisai S E Umbaugh et al ldquoSkin lesionclassification using relative color featuresrdquo Skin Research andTechnology vol 14 no 1 pp 53ndash64 2008

[42] K Cheung Image processing for skin cancer detection malignantmelanoma recognition [PhD thesis] University of Toronto 1997

[43] T K Lee D I McLean and M S Atkins ldquoIrregularity indexa new border irregularity measure for cutaneous melanocyticlesionsrdquoMedical Image Analysis vol 7 no 1 pp 47ndash64 2003

[44] T K Lee and E Claridge ldquoPredictive power of irregular bordershapes for malignant melanomasrdquo Skin Research and Technol-ogy vol 11 no 1 pp 1ndash8 2005

[45] Y Chang R J Stanley R H Moss andW Van Stoecker ldquoA sys-tematic heuristic approach for feature selection for melanomadiscrimination using clinical imagesrdquo Skin Research and Tech-nology vol 11 no 3 pp 165ndash178 2005

[46] A J Round AW G Duller and P J Fish ldquoLesion classificationusing skin patterningrdquo Skin Research and Technology vol 6 no4 pp 183ndash192 2000

BioMed Research International 17

[47] Z She and P Fish ldquoAnalysis of skin line pattern for lesionclassificationrdquo Skin Research and Technology vol 9 no 1 pp73ndash80 2003

[48] R P Walvick K Patel S V Patwardhan and A P DhawanldquoClassification of melanoma using wavelet-transform-basedoptimal feature setrdquo in Proceedings of the Medical ImagingImage Processing vol 5370 of Proceedings of SPIE pp 944ndash951May 2004

[49] W V Stoecker C-S Chiang and R H Moss ldquoTexture in skinimages comparison of three methods to determine smooth-nessrdquo Computerized Medical Imaging and Graphics vol 16 no3 pp 179ndash190 1992

[50] S V Deshabhoina S E Umbaugh W V Stoecker R H Mossand S K Srinivasan ldquoMelanoma and seborrheic keratosisdifferentiation using texture featuresrdquo Skin Research and Tech-nology vol 9 no 4 pp 348ndash356 2003

[51] G M Weiss ldquoMining with rarity a unifying frameworkrdquo ACMSIGKDD Explorations Newsletter vol 6 no 1 pp 7ndash19 2004

[52] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

[53] H Liu and L Yu ldquoToward integrating feature selection algo-rithms for classification and clusteringrdquo IEEE Transactions onKnowledge and Data Engineering vol 17 no 4 pp 491ndash5022005

[54] I T Jolliffe Principal Component Analysis Springer New YorkNY USA 1986

[55] I Kononenko and E Simec ldquoInduction of decision treesusing RELIEFFrdquo in Proceedings of the ISSEK94 Workshop onMathematical and Statistical Methods in Artificial Intelligencevol 363 of International Centre forMechanical Sciences pp 199ndash220 Springer Vienna Austria 1995

[56] R Battiti ldquoUsing mutual information for selecting features insupervised neural net learningrdquo IEEE Transactions on NeuralNetworks vol 5 no 4 pp 537ndash550 1994

[57] M A Hall ldquoCorrelation-based feature selection for discreteand numeric class machine learningrdquo in Proceedings of the 17thInternational Conference on Machine Learning (ICML rsquo00) pp359ndash366 Morgan Kaufmann 2000

[58] E E Ghiselli Theory of Psychological Measurement McGraw-Hill New York NY USA 1964

[59] I H Witten and E Frank Data Mining Practical MachineLearning Tools and Techniques Morgan Kaufmann San Fran-cisco Calif USA 2005

[60] S Dreiseitl L Ohno-Machado H Kittler S Vinterbo HBillhardt and M Binder ldquoA comparison of machine learningmethods for the diagnosis of pigmented skin lesionsrdquo Journal ofBiomedical Informatics vol 34 no 1 pp 28ndash36 2001

[61] A Glowacz A Glowacz and P Korohoda ldquoRecognition ofmonochrome thermal images of synchronous motor with theapplication of binarization and nearestmean classifierrdquoArchivesof Metallurgy and Materials vol 59 no 1 pp 31ndash34 2014

[62] R Tadeusiewicz How Intelligent Should be System for ImageAnalysisvol 339 of Studies in Computational IntelligenceSpringer Berlin Germany 2011

[63] L Jost ldquoEntropy and diversityrdquo Oikos vol 113 no 2 pp 363ndash375 2006

[64] K-B Duan and S S Keerthi ldquoWhich is the bestmulticlass SVMmethod An empirical studyrdquo inMultiple Classifier Systems 6thInternational Workshop MCS 2005 Seaside CA USA June 13ndash15 2005 Proceedings vol 3541 of Lecture Notes in ComputerScience pp 278ndash285 Springer Berlin Germany 2005

[65] C Hsu and C Lin ldquoA comparison of methods for multi-class support vector machinesrdquo IEEE Transactions on NeuralNetworks vol 13 no 2 pp 415ndash425 2002

[66] A A Safi Towards computer-aided diagnosis of pigmented skinlesionson [PhD thesis] Technische Universitat Munchen 2012

[67] C-W Hsu C-C Chang and C-J Lin ldquoA practical guide tosupport vector classificationrdquo Tech Rep National Taiwan Uni-versity 2003 httpwwwcsientuedutwsimcjlinpapersguideguidepdf

[68] A M Molinaro R Simon and R M Pfeiffer ldquoPrediction errorestimation a comparison of resamplingmethodsrdquo Bioinformat-ics vol 21 no 15 pp 3301ndash3307 2005

[69] R Bouckaert ldquoChoosing between two learning algorithmsbased on calibrated testsrdquo in Proceedings of the 20th Interna-tional Conference on Machine Learning (ICML rsquo03) T Fawcettand N Mishra Eds pp 51ndash58 AAAI Press Washington DCUSA 2003

[70] P Flach Machine Learning The Art and Science of AlgorithmsThat Make Sense of Data Cambridge University Press NewYork NY USA 2012

[71] P Rubegni G Cevenini M Burroni et al ldquoAutomated diagno-sis of pigmented skin lesionsrdquo International Journal of Cancervol 101 no 6 pp 576ndash580 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 12: Research Article Automatic Classification of Specific ...downloads.hindawi.com/journals/bmri/2016/8934242.pdfmost popular melanocytic nevi to the diagnostic categories including Clark

12 BioMed Research International

correlation-based feature selection has been chosen as theonly method which takes into account not only relationshipsbetween features and the decision class but also relationshipsbetween features themselves

The heuristic ldquomeritrdquo Merit119878of a feature subset 119878 contain-

ing 119896 features is given by [58]

Merit119878=

119896119903cf

radic119896 + 119896 (119896 minus 1) 119903ff (16)

where 119903cf is the average feature-class correlation and 119903ff theaverage feature-feature intercorrelation

To compare two feature vectors of discrete values thesymmetric uncertainty an improved version of informationgain measure has been used [59] Symmetric uncertainty isgiven by

SU (119883 119884) = 2 [119867 (119883) + 119867 (119884) minus 119867 (119883 119884)

119867 (119884) + 119867 (119883)] (17)

where 119867(119860) is the marginal entropy of set 119860 and 119867(119860 119861) isthe joint entropy of sets 119860 and 119861

The number of selected features is a parameter whichrequires a careful tuning too few features results may pre-vent classifiers from distinguishing between various classeswhereas too many features impose risk of overfittingmdashasituation when a model excels in classifying training databut fails to generalize knowledge and hence misclassifies newsamples As some researchers suggest limiting the number offeatures 119896 to the range of 5 le 119896 le 30 [24] in this study 119896 = 20

has been adoptedThe applied procedure allowed choosing features which

best discriminate lesion types concentricity computed using119896-means algorithm concentricity computed using kernel 119896-means algorithm for 120590 = 1 119871lowast119886lowast119887lowast histogram distancesin 1198711and 119871

2metrics computed for central and border

part variances from sets of 119871lowast

119886lowast

119887lowast histogram distances

between the whole region and individual quarters in both1198711and 119871

2metrics variances from sets of 119871lowast119886lowast119887lowast histogram

distances between each pair of quarters in both 1198711and 119871

2

metrics variances from sets of GLCM features (dissimilar-ity maximum probability entropy energy correlation andcontrast) computed for quarters variance of variances of119886lowast channel values (in 119871

lowast

119886lowast

119887lowast color space) calculated for

quarters variance of variances of H channel values (in HSVcolor space) calculated for quarters mean value of variancesof 119886lowast channel values (in 119871

lowast

119886lowast

119887lowast color space) for quarters

variance from sets of 119871lowast119886lowast119887lowast histogram distances betweenhalves in both 119871

1and 119871

2metrics and variance from sets of

GLCM dissimilarity computed for halvesThis particular choice of features does not diverge from

the clinical practice according to which asymmetry is one ofthe most important premises in determining the lesion type

210 Classification

2101 Models The following predictive models have beentested [60 61] 119896-nearest neighbors algorithm (for 119896 isin

3 5 10 15) logistic regression decision tree and supportvector machine (SVM) Other models like neural networks

or rule-based systems have not been a subject of this study[62] In case of the decision tree Gini-Simpson index hasbeen used as a measure of data diversity [63] Decisionboundary of SVMs has been determined using radial basisfunction as kernels Radial basis function has been preferredover linear sigmoid and polynomial kernels as it exhibits afew important properties [24] it allows classifying nonlin-early separable data sets (as is the case in this study) it ischaracterized by high numerical stability and it is definedusing only one parameter Moreover as in this study there areas many as four decision classes the multiclass classificationproblem has been decomposed into a (greater) number ofbinary classification problems Winner-takes-all strategy hasbeen applied to combine solutions of subproblems into afinal solution as for small data sets its results are similarto the results of the best strategymdashpairwise couplingmdashwhileat the same time its computational burden is much lower[64 65]

Individual classifiers have been trained using typicalparameters It has been assumed that the cost of misclassify-ing melanoma as nevi is four times higher than another sortof misclassification Initial experimental results have provenSVM to be the most effective classifier (similar observationshave been made by other research teams [60 66]) Con-sequently further experiments were focused on improvingperformance of SVMs by fine-tuning their parameters

SVMwith radial basis function kernel is governed by twoparametersmdashmisclassification cost 119862 and kernel width 120574The task is to select those parameters in such a way that theaccuracy of predictions for new data (data not used in thelearning process) is maximal

As values of only two parameters have had to be adjustedgrid search has been applied [67] For each parametervalues from exponential series have been considered 119862 isin

2minus5

2minus3

215

and 120574 isin 2minus15

2minus13

23

Each com-bination of parameter values (119862

0 1205740) has been assessed

using a validation procedure described below After the gridsearch had finished SVM have been trained using optimalparameter values (119862lowast 120574lowast)

2102 Validation The aim of the validation step is to assessclassifierrsquos quality based on the number of generalizationerrors that is misclassified samples Optimal SVM modelshave been validated using stratified Monte Carlo cross vali-dation method and in each iteration the test set consisted of10 of samples

The Monte Carlo variant of cross validation had beenchosen for a number of reasons Firstly experimental resultsobtained for a similar problem [24] suggest high effectivenessof this method Secondly in their analysis Molinaro et al[68] point out that for datasets consisting of few sampleswith many features such as a dataset used in this studythe difference in effectiveness between Monte Carlo andldquoclassicrdquo 10-fold cross validation is insignificant The numberof samples drawn into the test set has been determined usinga ldquorule of thumbrdquo [69] Finally by applying Monte Carlocross validation one may avoid constructing overdevel-oped predictive models which decreases the risk of overfit-ting

BioMed Research International 13

As in the data set used in this study there is a considerabledifference in the number of representatives between variousdecision classes and stratification has been applied Stratifica-tion ensures the similar distribution of representatives of eachdecision class in both training set and test set in each iterationof the validation procedure

3 Results and Discussion

31 Database Specification The described algorithm for theautomatic classification of specific melanocytic lesions hasbeen tested on dermoscopic images from a widely usedInteractive Atlas of Dermoscopy [2] Images for this atlashave been provided by two university hospitals (Universityof Naples Italy and University of Graz Austria) and storedon a CD-ROM in the JPEG format The documentation ofeach dermoscopic image was performed using a Dermaphotapparatus (Heine Optotechnik Herrsching Germany) anda photo camera (Nikon F3) mounted on a stereomicroscope(Wild M650 Heerbrugg AG Switzerland) in order to pro-duce digitized ELM images of skin lesions All the imageshave been assessed manually by a dermoscopic expert withan extensive clinical experience

Furthermore all the descriptions of skin cases were basedon the histopathological examination of the biopsy materialIn order to develop and test the automatic procedure for theclassification of melanocytic skin lesions 300 images withdifferent resolutions ranging from 0033 to 05mmpixelwere chosen The database included 100 Clark nevus cases70 blue nevus cases 70 Spitz nevus cases and 60 malignantmelanomas

The preprocessing step (black frame removal and hairremoval) as well as the segmentation step (border error lessthan 6) did not affect the further research [14]

32 Statistical Analysis The performance of a classifier canbe assessed based on the analysis of discrepancies in classifi-cation that is differences between the classification carriedout by the classifier and the actual classification (groundtruth) summarized in a confusion matrix In this matrixeach row refers to actual classes 119888(119909) as recorded in the testset and each column refers to classes as predicted by theclassifier (119909) The (119894 119895)th element contains the number oftest instances predicted by a classifier to belong to 119895th classclass whereas they are actually representatives of 119894th class

From a contingency table we can calculate three perfor-mance indicators accuracy true positive rate (TPR) and truenegative rate (TNR)

For a binary classification problem those indicators aregiven by [70]

Accuracy = 1

Tesum

119909isinTe119868 [ (119909) = 119888 (119909)]

TPR =sum119909isinTe 119868 [ (119909) = 119888 (119909) = oplus]

sum119909isinTe 119868 [119888 (119909) = oplus]

TNR =sum119909isinTe 119868 [ (119909) = 119888 (119909) = ⊖]

sum119909isinTe 119868 [119888 (119909) = ⊖]

(18)

where Te is the test set and the function 119868[sdot] denotes theindicator function Positive and negative classes are denotedby oplus and ⊖ respectively

Although the initial problem is a multiclass classificationproblem (with four classes) it can be decomposed into fourbinary classification problems whether an instance belongsto the given class or not

When dealing with a data set exhibiting class imbalanceproblem the accuracy should be considered only a prelim-inary performance indicator For instance for a data setwith class ratio 99 1 a classifier which simply counts eachinstance as a representative of the majority class would scorethe accuracy of 99 Therefore in such cases better measurewould be a plot of the ROC curve and the area under thatcurve

The ROC plot completely visualizes the (normalized)contingency table by means of plot in the unit squarewith TPR rate on the 119910-axis and TNR rate on the 119909-axisEach of the points marked on the ROC plot specifies theclassification performance in terms of true and false positiverates achieved by the corresponding score thresholds In ourstudy posterior probabilities have been used as scores Thosepoints are connected by straight lines producing a piecewiselinear curve the ROC curve that rises monotonically from(0 0) to (1 1) The ROC plot can be interpreted as a plotof costs-to-benefits ratio To measure the performance of aclassifier using a ROC plot the area under that plot curve(AUC) is calculated

Values of aforementioned performance indicators forall examined classifiers have been summarized in Table 2Additionally Figure 12 presents ROC plots for a malignant-or-benign classification

Among all examined classifiers SVM achieved definitelybest results It achieved highest overall accuracy ACCALL =

09296 whereas for the second-best classifier the logisticregression ACCALL = 08028 The SVM outperformed thelogistic regression in classifying all four lesion types It isto be stressed that the SVM turned out to be the mosteffective classifier in recognizing malignant melanoma themost dangerous of all four lesions Its true positive rate andarea under the curve amounted to TPRMM = 08611 andAUCMM = 09688 respectively A little bit lower true negativerate for the SVM than for the decision tree is not a problemin the studied area It indicates misclassification of Clarkor Spitz nevi as malignant melanoma and the cost of sucha misdiagnosis is much much lower than of classifying amalignant tumor as a benign one

The achieved results are slightly better thanmost results ofmalignant-or-benign studies conducted so far Our methodallowed classifying malignant lesions with TPRMM = 08611

and TNRMM = 09623 whereas for other similar studies TPRfalls within the range 08ndash10 andTNRndash05ndash095 respectively[66 71]

A truly reliable diagnosis can be made only by ahistopathological examination of a lesion When in theirstudies Curley et al asked three experienced dermatologiststo classify lesions based only on the visual examination thosephysicians identified correctly only half of cases Therefore

14 BioMed Research International

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(a) Logit (AUC = 09469)

0

02

04

06

08

1

TPR

02 04 06 08 101 minus TNR

(b) Tree (AUC = 09555)

0

02

04

06

08

1

TPR

104 06 080201 minus TNR

(c) 3NN (AUC = 09152)

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(d) 5NN (AUC = 08746)

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(e) 10NN (AUC = 07860)

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(f) 15NN (AUC = 07378)

02 04 06 080 11 minus TNR

0

02

04

06

08

1

TPR

(g) SVM (AUC = 09688)

Figure 12 ROC plots for individual classifiers for a malignant-or-benign classification

BioMed Research International 15

Table 2 The assessment of classifiers for a 1-vs-all and overall classification

Measure Logit Tree 3NN 5NN 10NN 15NN SVMTPRBN 09730 08485 07200 07660 06939 06735 10000TNRBN 10000 09266 10000 10000 09785 09677 10000AUCBN 10000 09675 09914 09889 09679 09659 10000TPRCN 08611 06977 08519 07586 08148 08182 09143TNRCN 09528 09394 08870 08761 08783 08500 09626AUCCN 09817 09279 09615 09089 09099 09030 09851TPRMM 08387 09655 07419 06786 06071 05185 08611TNRMM 09189 09381 08919 08596 08421 08174 09623AUCMM 09469 09555 09152 08746 07860 07378 09688TPRSN 08421 07568 08235 07105 06053 05227 09429TNRSN 09712 09333 09352 09231 08846 08776 09813AUCSN 09648 09425 09606 09395 08812 08545 09789ACCALL 08803 08028 07746 07324 06761 06197 09296BN blue nevus CN Clark nevus MM malignant melanoma SN Spitz nevus ALL multiclass classification

the presented method of classification yields better resultsthan those obtained during a medical examination

4 Conclusions

This paper presents a computer-aided approach for theautomatic classification of melanocytic lesions and accuratediagnosis of melanoma In our research we evaluated ourapproach with four types of classifiers 119896-nearest neighborsalgorithm logistic regression decision tree and SVM Thedeveloped system achieved 92 accuracy with SVM

Our study gives an important contribution to the researcharea of skin lesion classification for several reasons Firstly itfocuses on specifying the exact type of a skin lesion and notonly on categorization of skin lesions as benign or malignantSecondly this work combines the results of research doneso far related to all the steps needed for the developmentof an automatic diagnostic system for melanocytic lesiondetection and classification Finally the refinement of currentapproaches and development of new techniques andmethodswill help to improve the ability to diagnose skin moles moreprecisely and to achieve the goal of the significant reductionin melanoma mortality rate

41 Future Work Starting from the present frameworkfurther research efforts will be firstly addressed to compareand integrate the very promising approaches and corre-sponding feature descriptors reported in the most recentliterature in order to improve the classification accuracy ofmelanocytic lesions We will also conduct a follow-up studyby collecting more real data especially melanoma cases tofurther evaluate our approach One of the major problems inthe field of melanocytic skin tumors is the underdiagnosisof melanoma as a benign melanocytic or nonmelanocyticlesion To decrease the amount of dermoscopic pitfalls thenumber of melanocytic lesion types will be extended for abetter evaluation of melanocytic lesion

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This scientific work was partly supported by the National Sci-ence Centre based on the Decision no 201101NST706783

References

[1] Skin September 2015 httpsciencenationalgeographiccomsciencehealth-and-human-bodyhuman-bodyskin-article

[2] G Argenziano and H P Soyer Interactive Atlas of DermoscopyBook and CDWeb Resource Edra Medical Publishing and NewMedia Milan Itlay 2000

[3] K Korotkov and R Garcia ldquoComputerized analysis of pig-mented skin lesions a reviewrdquoArtificial Intelligence inMedicinevol 56 no 2 pp 69ndash90 2012

[4] P Soyer G Argenziano R Hofmann-Wellenhof and RH JohrColor Atlas of Melanocytic Lesions of the Skin Springer NewYork NY USA 2007

[5] Melanoma Foundation of New Zealand About melanomakey information 2014 httpwwwmelanomanetworkconzAbout-MelanomaKey-Information

[6] E Stocketh T Rosen and S Shumack Managing Skin CancerSpringer 2010

[7] S W Menzies L Bischof H Talbot et al ldquoThe performanceof SolarScan an automated dermoscopy image analysis instru-ment for the diagnosis of primary melanomardquo Archives ofDermatology vol 141 no 11 pp 1388ndash1396 2005

[8] M Burroni R CoronaGDellrsquoEva et al ldquoMelanoma computer-aided diagnosis reliability and feasibility studyrdquoClinical CancerResearch vol 10 no 6 pp 1881ndash1886 2004

[9] A Masood and A A Al-Jumaily ldquoComputer aided diagnosticsupport system for skin cancer a review of techniques andalgorithmsrdquo International Journal of Biomedical Imaging vol2013 Article ID 323268 22 pages 2013

16 BioMed Research International

[10] M E Celebi W V Stoecker and R H Moss ldquoAdvances inskin cancer image analysisrdquo ComputerizedMedical Imaging andGraphics vol 35 no 2 pp 83ndash84 2011

[11] H Zhou M Chen R Gass et al ldquoFeature-preserving artifactremoval from dermoscopy imagesrdquo in Medical Imaging ImageProcessing vol 6914 of Proceedings of SPIE pp 1ndash9 San DiegoCalif USA February 2008

[12] P Wighton T K Lee and M S Atkins ldquoDermascopic hairdisocclusion using inpaintingrdquo in Medical Imaging Image Pro-cessing vol 6914 ofProceedings of SPIE pp 1ndash8 SanDiego CalifUSA February 2008

[13] Q Abbas I F Garcia M Emre Celebi and W Ahmad ldquoAfeature-preserving hair removal algorithm for dermoscopyimagesrdquo Skin Research and Technology vol 19 no 1 pp e27ndashe36 2013

[14] J Jaworek-Korjakowska ldquoNovel method for border irregularityassessment in dermoscopic color imagesrdquo Computational andMathematicalMethods inMedicine vol 2015 Article ID 49620211 pages 2015

[15] J Jaworek-Korjakowska and R Tadeusiewicz ldquoHair removalfrom dermoscopic color imagesrdquo Bio-Algorithms and Med-Systems vol 9 no 2 pp 53ndash58 2013

[16] M E Celebi H Iyatomi G Schaefer and W V StoeckerldquoLesion border detection in dermoscopy imagesrdquoComputerizedMedical Imaging and Graphics vol 33 no 2 pp 148ndash153 2009

[17] J Jaworek-Korjakowska Analiza i detekcja struktur lokalnych wczerniaku złosliwym [Autoreferat Rozprawy Doktorskiej] AGHUniversity of Science and Technology Krakow Poland 2013

[18] E Zagrouba and W Barhoumi ldquoA prelimary approach for theautomated recognition ofmalignantmelanomardquo ImageAnalysisamp Stereology vol 23 no 2 pp 121ndash135 2004

[19] D Ballardand and C Brown Computer Vision Prentice HallProfessional Technical Reference 1st edition 1982

[20] R M Haralick ldquoA measure for circularity of digital figuresrdquoIEEE Transactions on Systems Man and Cybernetics vol 4 no4 pp 394ndash396 1974

[21] R S Montero and E Bribiesca ldquoState of the art of compactnessand circularity measuresrdquo International Mathematical Forumvol 4 no 27 pp 1305ndash1335 2009

[22] J Smietanski R Tadeusiewicz and E Łuczynska ldquoTextureanalysis in perfusion images of prostate cancermdasha case studyrdquoInternational Journal of Applied Mathematics and ComputerScience vol 20 no 1 pp 149ndash156 2010

[23] J W Choi Y W Park S Y Byun and S W Youn ldquoDifferentia-tion of benign pigmented skin lesions with the aid of computerimage analysis a novel approachrdquo Annals of Dermatology vol25 no 3 pp 340ndash347 2013

[24] M E Celebi H A Kingravi B Uddin et al ldquoA methodologicalapproach to the classification of dermoscopy imagesrdquo Comput-erized Medical Imaging and Graphics vol 31 no 6 pp 362ndash3732007

[25] C Manning P Raghavan and H Schutze Introduction toInformation Retrieval Cambridge University Press 2008

[26] J Shawe-Taylor and N Cristianini Kernel Methods for PatternAnalysis Cambridge University Press Cambridge UK 2004

[27] L Kaufman and P Rousseeuw Finding Groups in Data AnIntroduction to Cluster Analysis Wiley Hoboken NJ USA1990

[28] R M Haralick K Shanmugam and I Dinstein ldquoTexturalfeatures for image classificationrdquo IEEE Transactions on SystemsMan and Cybernetics vol 3 no 6 pp 610ndash621 1973

[29] MM Galloway ldquoTexture analysis using gray level run lengthsrdquoComputer Graphics and Image Processing vol 4 no 2 pp 172ndash179 1975

[30] A Chu C M Sehgal and J F Greenleaf ldquoUse of gray valuedistribution of run lengths for texture analysisrdquo Pattern Recog-nition Letters vol 11 no 6 pp 415ndash419 1990

[31] B V Dasarathy and E B Holder ldquoImage characterizationsbased on joint gray level-run length distributionsrdquo PatternRecognition Letters vol 12 no 8 pp 497ndash502 1991

[32] H Ganster A Pinz R Rohrer E Wildling M Binder and HKittler ldquoAutomated melanoma recognitionrdquo IEEE Transactionson Medical Imaging vol 20 no 3 pp 233ndash239 2001

[33] M DrsquoAmico M Ferri and I Stanganelli ldquoQualitative asym-metry measure for melanoma detectionrdquo in Proceedings of theIEEE International Symposium on Biomedical Imaging Nano toMacro vol 2 pp 1155ndash1158 Arlington Va USA April 2004

[34] W V Stoecker W W Li and R H Moss ldquoAutomatic detectionof asymmetry in skin tumorsrdquo Computerized Medical Imagingand Graphics vol 16 no 3 pp 191ndash197 1992

[35] U Fayyad and K Irani ldquoMulti-interval discretization ofcontinuous-valued attributes for classification learningrdquo inProceedings of the 13th International Joint Conference on ArticialIntelligence vol 2 pp 1022ndash1029 Morgan Kaufmann 1993

[36] S Aksoy and R M Haralick ldquoFeature normalization andlikelihood-based similarity measures for image retrievalrdquo Pat-tern Recognition Letters vol 22 no 5 pp 563ndash582 2001

[37] V Ng and D Cheung ldquoMeasuring asymmetries of skin lesionsrdquoin Proceedings of IEEE International Conference on SystemsMan and Cybernetics vol 5 pp 4211ndash4216 IEEE PressOrlando Fla USA October 1997

[38] Z She Y Liu and A Damatoa ldquoCombination of features fromskin pattern and ABCD analysis for lesion classificationrdquo SkinResearch and Technology vol 13 no 1 pp 25ndash33 2007

[39] B Kusumoputro and A Ariyanto ldquoNeural network diagnosisof malignant skin cancers using principal component analysisas a preprocessorrdquo in Proceedings of the IEEE International JointConference on Neural Networks and the IEEE World Congresson Computational Intelligence vol 1 pp 310ndash315 IEEE PressAnchorage Alaska USA May 1998

[40] E Claridge P N Hall M Keefe and J P Allen ldquoShape analysisfor classification ofmalignantmelanomardquo Journal of BiomedicalEngineering vol 14 no 3 pp 229ndash234 1992

[41] Y Cheng R Swamisai S E Umbaugh et al ldquoSkin lesionclassification using relative color featuresrdquo Skin Research andTechnology vol 14 no 1 pp 53ndash64 2008

[42] K Cheung Image processing for skin cancer detection malignantmelanoma recognition [PhD thesis] University of Toronto 1997

[43] T K Lee D I McLean and M S Atkins ldquoIrregularity indexa new border irregularity measure for cutaneous melanocyticlesionsrdquoMedical Image Analysis vol 7 no 1 pp 47ndash64 2003

[44] T K Lee and E Claridge ldquoPredictive power of irregular bordershapes for malignant melanomasrdquo Skin Research and Technol-ogy vol 11 no 1 pp 1ndash8 2005

[45] Y Chang R J Stanley R H Moss andW Van Stoecker ldquoA sys-tematic heuristic approach for feature selection for melanomadiscrimination using clinical imagesrdquo Skin Research and Tech-nology vol 11 no 3 pp 165ndash178 2005

[46] A J Round AW G Duller and P J Fish ldquoLesion classificationusing skin patterningrdquo Skin Research and Technology vol 6 no4 pp 183ndash192 2000

BioMed Research International 17

[47] Z She and P Fish ldquoAnalysis of skin line pattern for lesionclassificationrdquo Skin Research and Technology vol 9 no 1 pp73ndash80 2003

[48] R P Walvick K Patel S V Patwardhan and A P DhawanldquoClassification of melanoma using wavelet-transform-basedoptimal feature setrdquo in Proceedings of the Medical ImagingImage Processing vol 5370 of Proceedings of SPIE pp 944ndash951May 2004

[49] W V Stoecker C-S Chiang and R H Moss ldquoTexture in skinimages comparison of three methods to determine smooth-nessrdquo Computerized Medical Imaging and Graphics vol 16 no3 pp 179ndash190 1992

[50] S V Deshabhoina S E Umbaugh W V Stoecker R H Mossand S K Srinivasan ldquoMelanoma and seborrheic keratosisdifferentiation using texture featuresrdquo Skin Research and Tech-nology vol 9 no 4 pp 348ndash356 2003

[51] G M Weiss ldquoMining with rarity a unifying frameworkrdquo ACMSIGKDD Explorations Newsletter vol 6 no 1 pp 7ndash19 2004

[52] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

[53] H Liu and L Yu ldquoToward integrating feature selection algo-rithms for classification and clusteringrdquo IEEE Transactions onKnowledge and Data Engineering vol 17 no 4 pp 491ndash5022005

[54] I T Jolliffe Principal Component Analysis Springer New YorkNY USA 1986

[55] I Kononenko and E Simec ldquoInduction of decision treesusing RELIEFFrdquo in Proceedings of the ISSEK94 Workshop onMathematical and Statistical Methods in Artificial Intelligencevol 363 of International Centre forMechanical Sciences pp 199ndash220 Springer Vienna Austria 1995

[56] R Battiti ldquoUsing mutual information for selecting features insupervised neural net learningrdquo IEEE Transactions on NeuralNetworks vol 5 no 4 pp 537ndash550 1994

[57] M A Hall ldquoCorrelation-based feature selection for discreteand numeric class machine learningrdquo in Proceedings of the 17thInternational Conference on Machine Learning (ICML rsquo00) pp359ndash366 Morgan Kaufmann 2000

[58] E E Ghiselli Theory of Psychological Measurement McGraw-Hill New York NY USA 1964

[59] I H Witten and E Frank Data Mining Practical MachineLearning Tools and Techniques Morgan Kaufmann San Fran-cisco Calif USA 2005

[60] S Dreiseitl L Ohno-Machado H Kittler S Vinterbo HBillhardt and M Binder ldquoA comparison of machine learningmethods for the diagnosis of pigmented skin lesionsrdquo Journal ofBiomedical Informatics vol 34 no 1 pp 28ndash36 2001

[61] A Glowacz A Glowacz and P Korohoda ldquoRecognition ofmonochrome thermal images of synchronous motor with theapplication of binarization and nearestmean classifierrdquoArchivesof Metallurgy and Materials vol 59 no 1 pp 31ndash34 2014

[62] R Tadeusiewicz How Intelligent Should be System for ImageAnalysisvol 339 of Studies in Computational IntelligenceSpringer Berlin Germany 2011

[63] L Jost ldquoEntropy and diversityrdquo Oikos vol 113 no 2 pp 363ndash375 2006

[64] K-B Duan and S S Keerthi ldquoWhich is the bestmulticlass SVMmethod An empirical studyrdquo inMultiple Classifier Systems 6thInternational Workshop MCS 2005 Seaside CA USA June 13ndash15 2005 Proceedings vol 3541 of Lecture Notes in ComputerScience pp 278ndash285 Springer Berlin Germany 2005

[65] C Hsu and C Lin ldquoA comparison of methods for multi-class support vector machinesrdquo IEEE Transactions on NeuralNetworks vol 13 no 2 pp 415ndash425 2002

[66] A A Safi Towards computer-aided diagnosis of pigmented skinlesionson [PhD thesis] Technische Universitat Munchen 2012

[67] C-W Hsu C-C Chang and C-J Lin ldquoA practical guide tosupport vector classificationrdquo Tech Rep National Taiwan Uni-versity 2003 httpwwwcsientuedutwsimcjlinpapersguideguidepdf

[68] A M Molinaro R Simon and R M Pfeiffer ldquoPrediction errorestimation a comparison of resamplingmethodsrdquo Bioinformat-ics vol 21 no 15 pp 3301ndash3307 2005

[69] R Bouckaert ldquoChoosing between two learning algorithmsbased on calibrated testsrdquo in Proceedings of the 20th Interna-tional Conference on Machine Learning (ICML rsquo03) T Fawcettand N Mishra Eds pp 51ndash58 AAAI Press Washington DCUSA 2003

[70] P Flach Machine Learning The Art and Science of AlgorithmsThat Make Sense of Data Cambridge University Press NewYork NY USA 2012

[71] P Rubegni G Cevenini M Burroni et al ldquoAutomated diagno-sis of pigmented skin lesionsrdquo International Journal of Cancervol 101 no 6 pp 576ndash580 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 13: Research Article Automatic Classification of Specific ...downloads.hindawi.com/journals/bmri/2016/8934242.pdfmost popular melanocytic nevi to the diagnostic categories including Clark

BioMed Research International 13

As in the data set used in this study there is a considerabledifference in the number of representatives between variousdecision classes and stratification has been applied Stratifica-tion ensures the similar distribution of representatives of eachdecision class in both training set and test set in each iterationof the validation procedure

3 Results and Discussion

31 Database Specification The described algorithm for theautomatic classification of specific melanocytic lesions hasbeen tested on dermoscopic images from a widely usedInteractive Atlas of Dermoscopy [2] Images for this atlashave been provided by two university hospitals (Universityof Naples Italy and University of Graz Austria) and storedon a CD-ROM in the JPEG format The documentation ofeach dermoscopic image was performed using a Dermaphotapparatus (Heine Optotechnik Herrsching Germany) anda photo camera (Nikon F3) mounted on a stereomicroscope(Wild M650 Heerbrugg AG Switzerland) in order to pro-duce digitized ELM images of skin lesions All the imageshave been assessed manually by a dermoscopic expert withan extensive clinical experience

Furthermore all the descriptions of skin cases were basedon the histopathological examination of the biopsy materialIn order to develop and test the automatic procedure for theclassification of melanocytic skin lesions 300 images withdifferent resolutions ranging from 0033 to 05mmpixelwere chosen The database included 100 Clark nevus cases70 blue nevus cases 70 Spitz nevus cases and 60 malignantmelanomas

The preprocessing step (black frame removal and hairremoval) as well as the segmentation step (border error lessthan 6) did not affect the further research [14]

32 Statistical Analysis The performance of a classifier canbe assessed based on the analysis of discrepancies in classifi-cation that is differences between the classification carriedout by the classifier and the actual classification (groundtruth) summarized in a confusion matrix In this matrixeach row refers to actual classes 119888(119909) as recorded in the testset and each column refers to classes as predicted by theclassifier (119909) The (119894 119895)th element contains the number oftest instances predicted by a classifier to belong to 119895th classclass whereas they are actually representatives of 119894th class

From a contingency table we can calculate three perfor-mance indicators accuracy true positive rate (TPR) and truenegative rate (TNR)

For a binary classification problem those indicators aregiven by [70]

Accuracy = 1

Tesum

119909isinTe119868 [ (119909) = 119888 (119909)]

TPR =sum119909isinTe 119868 [ (119909) = 119888 (119909) = oplus]

sum119909isinTe 119868 [119888 (119909) = oplus]

TNR =sum119909isinTe 119868 [ (119909) = 119888 (119909) = ⊖]

sum119909isinTe 119868 [119888 (119909) = ⊖]

(18)

where Te is the test set and the function 119868[sdot] denotes theindicator function Positive and negative classes are denotedby oplus and ⊖ respectively

Although the initial problem is a multiclass classificationproblem (with four classes) it can be decomposed into fourbinary classification problems whether an instance belongsto the given class or not

When dealing with a data set exhibiting class imbalanceproblem the accuracy should be considered only a prelim-inary performance indicator For instance for a data setwith class ratio 99 1 a classifier which simply counts eachinstance as a representative of the majority class would scorethe accuracy of 99 Therefore in such cases better measurewould be a plot of the ROC curve and the area under thatcurve

The ROC plot completely visualizes the (normalized)contingency table by means of plot in the unit squarewith TPR rate on the 119910-axis and TNR rate on the 119909-axisEach of the points marked on the ROC plot specifies theclassification performance in terms of true and false positiverates achieved by the corresponding score thresholds In ourstudy posterior probabilities have been used as scores Thosepoints are connected by straight lines producing a piecewiselinear curve the ROC curve that rises monotonically from(0 0) to (1 1) The ROC plot can be interpreted as a plotof costs-to-benefits ratio To measure the performance of aclassifier using a ROC plot the area under that plot curve(AUC) is calculated

Values of aforementioned performance indicators forall examined classifiers have been summarized in Table 2Additionally Figure 12 presents ROC plots for a malignant-or-benign classification

Among all examined classifiers SVM achieved definitelybest results It achieved highest overall accuracy ACCALL =

09296 whereas for the second-best classifier the logisticregression ACCALL = 08028 The SVM outperformed thelogistic regression in classifying all four lesion types It isto be stressed that the SVM turned out to be the mosteffective classifier in recognizing malignant melanoma themost dangerous of all four lesions Its true positive rate andarea under the curve amounted to TPRMM = 08611 andAUCMM = 09688 respectively A little bit lower true negativerate for the SVM than for the decision tree is not a problemin the studied area It indicates misclassification of Clarkor Spitz nevi as malignant melanoma and the cost of sucha misdiagnosis is much much lower than of classifying amalignant tumor as a benign one

The achieved results are slightly better thanmost results ofmalignant-or-benign studies conducted so far Our methodallowed classifying malignant lesions with TPRMM = 08611

and TNRMM = 09623 whereas for other similar studies TPRfalls within the range 08ndash10 andTNRndash05ndash095 respectively[66 71]

A truly reliable diagnosis can be made only by ahistopathological examination of a lesion When in theirstudies Curley et al asked three experienced dermatologiststo classify lesions based only on the visual examination thosephysicians identified correctly only half of cases Therefore

14 BioMed Research International

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(a) Logit (AUC = 09469)

0

02

04

06

08

1

TPR

02 04 06 08 101 minus TNR

(b) Tree (AUC = 09555)

0

02

04

06

08

1

TPR

104 06 080201 minus TNR

(c) 3NN (AUC = 09152)

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(d) 5NN (AUC = 08746)

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(e) 10NN (AUC = 07860)

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(f) 15NN (AUC = 07378)

02 04 06 080 11 minus TNR

0

02

04

06

08

1

TPR

(g) SVM (AUC = 09688)

Figure 12 ROC plots for individual classifiers for a malignant-or-benign classification

BioMed Research International 15

Table 2 The assessment of classifiers for a 1-vs-all and overall classification

Measure Logit Tree 3NN 5NN 10NN 15NN SVMTPRBN 09730 08485 07200 07660 06939 06735 10000TNRBN 10000 09266 10000 10000 09785 09677 10000AUCBN 10000 09675 09914 09889 09679 09659 10000TPRCN 08611 06977 08519 07586 08148 08182 09143TNRCN 09528 09394 08870 08761 08783 08500 09626AUCCN 09817 09279 09615 09089 09099 09030 09851TPRMM 08387 09655 07419 06786 06071 05185 08611TNRMM 09189 09381 08919 08596 08421 08174 09623AUCMM 09469 09555 09152 08746 07860 07378 09688TPRSN 08421 07568 08235 07105 06053 05227 09429TNRSN 09712 09333 09352 09231 08846 08776 09813AUCSN 09648 09425 09606 09395 08812 08545 09789ACCALL 08803 08028 07746 07324 06761 06197 09296BN blue nevus CN Clark nevus MM malignant melanoma SN Spitz nevus ALL multiclass classification

the presented method of classification yields better resultsthan those obtained during a medical examination

4 Conclusions

This paper presents a computer-aided approach for theautomatic classification of melanocytic lesions and accuratediagnosis of melanoma In our research we evaluated ourapproach with four types of classifiers 119896-nearest neighborsalgorithm logistic regression decision tree and SVM Thedeveloped system achieved 92 accuracy with SVM

Our study gives an important contribution to the researcharea of skin lesion classification for several reasons Firstly itfocuses on specifying the exact type of a skin lesion and notonly on categorization of skin lesions as benign or malignantSecondly this work combines the results of research doneso far related to all the steps needed for the developmentof an automatic diagnostic system for melanocytic lesiondetection and classification Finally the refinement of currentapproaches and development of new techniques andmethodswill help to improve the ability to diagnose skin moles moreprecisely and to achieve the goal of the significant reductionin melanoma mortality rate

41 Future Work Starting from the present frameworkfurther research efforts will be firstly addressed to compareand integrate the very promising approaches and corre-sponding feature descriptors reported in the most recentliterature in order to improve the classification accuracy ofmelanocytic lesions We will also conduct a follow-up studyby collecting more real data especially melanoma cases tofurther evaluate our approach One of the major problems inthe field of melanocytic skin tumors is the underdiagnosisof melanoma as a benign melanocytic or nonmelanocyticlesion To decrease the amount of dermoscopic pitfalls thenumber of melanocytic lesion types will be extended for abetter evaluation of melanocytic lesion

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This scientific work was partly supported by the National Sci-ence Centre based on the Decision no 201101NST706783

References

[1] Skin September 2015 httpsciencenationalgeographiccomsciencehealth-and-human-bodyhuman-bodyskin-article

[2] G Argenziano and H P Soyer Interactive Atlas of DermoscopyBook and CDWeb Resource Edra Medical Publishing and NewMedia Milan Itlay 2000

[3] K Korotkov and R Garcia ldquoComputerized analysis of pig-mented skin lesions a reviewrdquoArtificial Intelligence inMedicinevol 56 no 2 pp 69ndash90 2012

[4] P Soyer G Argenziano R Hofmann-Wellenhof and RH JohrColor Atlas of Melanocytic Lesions of the Skin Springer NewYork NY USA 2007

[5] Melanoma Foundation of New Zealand About melanomakey information 2014 httpwwwmelanomanetworkconzAbout-MelanomaKey-Information

[6] E Stocketh T Rosen and S Shumack Managing Skin CancerSpringer 2010

[7] S W Menzies L Bischof H Talbot et al ldquoThe performanceof SolarScan an automated dermoscopy image analysis instru-ment for the diagnosis of primary melanomardquo Archives ofDermatology vol 141 no 11 pp 1388ndash1396 2005

[8] M Burroni R CoronaGDellrsquoEva et al ldquoMelanoma computer-aided diagnosis reliability and feasibility studyrdquoClinical CancerResearch vol 10 no 6 pp 1881ndash1886 2004

[9] A Masood and A A Al-Jumaily ldquoComputer aided diagnosticsupport system for skin cancer a review of techniques andalgorithmsrdquo International Journal of Biomedical Imaging vol2013 Article ID 323268 22 pages 2013

16 BioMed Research International

[10] M E Celebi W V Stoecker and R H Moss ldquoAdvances inskin cancer image analysisrdquo ComputerizedMedical Imaging andGraphics vol 35 no 2 pp 83ndash84 2011

[11] H Zhou M Chen R Gass et al ldquoFeature-preserving artifactremoval from dermoscopy imagesrdquo in Medical Imaging ImageProcessing vol 6914 of Proceedings of SPIE pp 1ndash9 San DiegoCalif USA February 2008

[12] P Wighton T K Lee and M S Atkins ldquoDermascopic hairdisocclusion using inpaintingrdquo in Medical Imaging Image Pro-cessing vol 6914 ofProceedings of SPIE pp 1ndash8 SanDiego CalifUSA February 2008

[13] Q Abbas I F Garcia M Emre Celebi and W Ahmad ldquoAfeature-preserving hair removal algorithm for dermoscopyimagesrdquo Skin Research and Technology vol 19 no 1 pp e27ndashe36 2013

[14] J Jaworek-Korjakowska ldquoNovel method for border irregularityassessment in dermoscopic color imagesrdquo Computational andMathematicalMethods inMedicine vol 2015 Article ID 49620211 pages 2015

[15] J Jaworek-Korjakowska and R Tadeusiewicz ldquoHair removalfrom dermoscopic color imagesrdquo Bio-Algorithms and Med-Systems vol 9 no 2 pp 53ndash58 2013

[16] M E Celebi H Iyatomi G Schaefer and W V StoeckerldquoLesion border detection in dermoscopy imagesrdquoComputerizedMedical Imaging and Graphics vol 33 no 2 pp 148ndash153 2009

[17] J Jaworek-Korjakowska Analiza i detekcja struktur lokalnych wczerniaku złosliwym [Autoreferat Rozprawy Doktorskiej] AGHUniversity of Science and Technology Krakow Poland 2013

[18] E Zagrouba and W Barhoumi ldquoA prelimary approach for theautomated recognition ofmalignantmelanomardquo ImageAnalysisamp Stereology vol 23 no 2 pp 121ndash135 2004

[19] D Ballardand and C Brown Computer Vision Prentice HallProfessional Technical Reference 1st edition 1982

[20] R M Haralick ldquoA measure for circularity of digital figuresrdquoIEEE Transactions on Systems Man and Cybernetics vol 4 no4 pp 394ndash396 1974

[21] R S Montero and E Bribiesca ldquoState of the art of compactnessand circularity measuresrdquo International Mathematical Forumvol 4 no 27 pp 1305ndash1335 2009

[22] J Smietanski R Tadeusiewicz and E Łuczynska ldquoTextureanalysis in perfusion images of prostate cancermdasha case studyrdquoInternational Journal of Applied Mathematics and ComputerScience vol 20 no 1 pp 149ndash156 2010

[23] J W Choi Y W Park S Y Byun and S W Youn ldquoDifferentia-tion of benign pigmented skin lesions with the aid of computerimage analysis a novel approachrdquo Annals of Dermatology vol25 no 3 pp 340ndash347 2013

[24] M E Celebi H A Kingravi B Uddin et al ldquoA methodologicalapproach to the classification of dermoscopy imagesrdquo Comput-erized Medical Imaging and Graphics vol 31 no 6 pp 362ndash3732007

[25] C Manning P Raghavan and H Schutze Introduction toInformation Retrieval Cambridge University Press 2008

[26] J Shawe-Taylor and N Cristianini Kernel Methods for PatternAnalysis Cambridge University Press Cambridge UK 2004

[27] L Kaufman and P Rousseeuw Finding Groups in Data AnIntroduction to Cluster Analysis Wiley Hoboken NJ USA1990

[28] R M Haralick K Shanmugam and I Dinstein ldquoTexturalfeatures for image classificationrdquo IEEE Transactions on SystemsMan and Cybernetics vol 3 no 6 pp 610ndash621 1973

[29] MM Galloway ldquoTexture analysis using gray level run lengthsrdquoComputer Graphics and Image Processing vol 4 no 2 pp 172ndash179 1975

[30] A Chu C M Sehgal and J F Greenleaf ldquoUse of gray valuedistribution of run lengths for texture analysisrdquo Pattern Recog-nition Letters vol 11 no 6 pp 415ndash419 1990

[31] B V Dasarathy and E B Holder ldquoImage characterizationsbased on joint gray level-run length distributionsrdquo PatternRecognition Letters vol 12 no 8 pp 497ndash502 1991

[32] H Ganster A Pinz R Rohrer E Wildling M Binder and HKittler ldquoAutomated melanoma recognitionrdquo IEEE Transactionson Medical Imaging vol 20 no 3 pp 233ndash239 2001

[33] M DrsquoAmico M Ferri and I Stanganelli ldquoQualitative asym-metry measure for melanoma detectionrdquo in Proceedings of theIEEE International Symposium on Biomedical Imaging Nano toMacro vol 2 pp 1155ndash1158 Arlington Va USA April 2004

[34] W V Stoecker W W Li and R H Moss ldquoAutomatic detectionof asymmetry in skin tumorsrdquo Computerized Medical Imagingand Graphics vol 16 no 3 pp 191ndash197 1992

[35] U Fayyad and K Irani ldquoMulti-interval discretization ofcontinuous-valued attributes for classification learningrdquo inProceedings of the 13th International Joint Conference on ArticialIntelligence vol 2 pp 1022ndash1029 Morgan Kaufmann 1993

[36] S Aksoy and R M Haralick ldquoFeature normalization andlikelihood-based similarity measures for image retrievalrdquo Pat-tern Recognition Letters vol 22 no 5 pp 563ndash582 2001

[37] V Ng and D Cheung ldquoMeasuring asymmetries of skin lesionsrdquoin Proceedings of IEEE International Conference on SystemsMan and Cybernetics vol 5 pp 4211ndash4216 IEEE PressOrlando Fla USA October 1997

[38] Z She Y Liu and A Damatoa ldquoCombination of features fromskin pattern and ABCD analysis for lesion classificationrdquo SkinResearch and Technology vol 13 no 1 pp 25ndash33 2007

[39] B Kusumoputro and A Ariyanto ldquoNeural network diagnosisof malignant skin cancers using principal component analysisas a preprocessorrdquo in Proceedings of the IEEE International JointConference on Neural Networks and the IEEE World Congresson Computational Intelligence vol 1 pp 310ndash315 IEEE PressAnchorage Alaska USA May 1998

[40] E Claridge P N Hall M Keefe and J P Allen ldquoShape analysisfor classification ofmalignantmelanomardquo Journal of BiomedicalEngineering vol 14 no 3 pp 229ndash234 1992

[41] Y Cheng R Swamisai S E Umbaugh et al ldquoSkin lesionclassification using relative color featuresrdquo Skin Research andTechnology vol 14 no 1 pp 53ndash64 2008

[42] K Cheung Image processing for skin cancer detection malignantmelanoma recognition [PhD thesis] University of Toronto 1997

[43] T K Lee D I McLean and M S Atkins ldquoIrregularity indexa new border irregularity measure for cutaneous melanocyticlesionsrdquoMedical Image Analysis vol 7 no 1 pp 47ndash64 2003

[44] T K Lee and E Claridge ldquoPredictive power of irregular bordershapes for malignant melanomasrdquo Skin Research and Technol-ogy vol 11 no 1 pp 1ndash8 2005

[45] Y Chang R J Stanley R H Moss andW Van Stoecker ldquoA sys-tematic heuristic approach for feature selection for melanomadiscrimination using clinical imagesrdquo Skin Research and Tech-nology vol 11 no 3 pp 165ndash178 2005

[46] A J Round AW G Duller and P J Fish ldquoLesion classificationusing skin patterningrdquo Skin Research and Technology vol 6 no4 pp 183ndash192 2000

BioMed Research International 17

[47] Z She and P Fish ldquoAnalysis of skin line pattern for lesionclassificationrdquo Skin Research and Technology vol 9 no 1 pp73ndash80 2003

[48] R P Walvick K Patel S V Patwardhan and A P DhawanldquoClassification of melanoma using wavelet-transform-basedoptimal feature setrdquo in Proceedings of the Medical ImagingImage Processing vol 5370 of Proceedings of SPIE pp 944ndash951May 2004

[49] W V Stoecker C-S Chiang and R H Moss ldquoTexture in skinimages comparison of three methods to determine smooth-nessrdquo Computerized Medical Imaging and Graphics vol 16 no3 pp 179ndash190 1992

[50] S V Deshabhoina S E Umbaugh W V Stoecker R H Mossand S K Srinivasan ldquoMelanoma and seborrheic keratosisdifferentiation using texture featuresrdquo Skin Research and Tech-nology vol 9 no 4 pp 348ndash356 2003

[51] G M Weiss ldquoMining with rarity a unifying frameworkrdquo ACMSIGKDD Explorations Newsletter vol 6 no 1 pp 7ndash19 2004

[52] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

[53] H Liu and L Yu ldquoToward integrating feature selection algo-rithms for classification and clusteringrdquo IEEE Transactions onKnowledge and Data Engineering vol 17 no 4 pp 491ndash5022005

[54] I T Jolliffe Principal Component Analysis Springer New YorkNY USA 1986

[55] I Kononenko and E Simec ldquoInduction of decision treesusing RELIEFFrdquo in Proceedings of the ISSEK94 Workshop onMathematical and Statistical Methods in Artificial Intelligencevol 363 of International Centre forMechanical Sciences pp 199ndash220 Springer Vienna Austria 1995

[56] R Battiti ldquoUsing mutual information for selecting features insupervised neural net learningrdquo IEEE Transactions on NeuralNetworks vol 5 no 4 pp 537ndash550 1994

[57] M A Hall ldquoCorrelation-based feature selection for discreteand numeric class machine learningrdquo in Proceedings of the 17thInternational Conference on Machine Learning (ICML rsquo00) pp359ndash366 Morgan Kaufmann 2000

[58] E E Ghiselli Theory of Psychological Measurement McGraw-Hill New York NY USA 1964

[59] I H Witten and E Frank Data Mining Practical MachineLearning Tools and Techniques Morgan Kaufmann San Fran-cisco Calif USA 2005

[60] S Dreiseitl L Ohno-Machado H Kittler S Vinterbo HBillhardt and M Binder ldquoA comparison of machine learningmethods for the diagnosis of pigmented skin lesionsrdquo Journal ofBiomedical Informatics vol 34 no 1 pp 28ndash36 2001

[61] A Glowacz A Glowacz and P Korohoda ldquoRecognition ofmonochrome thermal images of synchronous motor with theapplication of binarization and nearestmean classifierrdquoArchivesof Metallurgy and Materials vol 59 no 1 pp 31ndash34 2014

[62] R Tadeusiewicz How Intelligent Should be System for ImageAnalysisvol 339 of Studies in Computational IntelligenceSpringer Berlin Germany 2011

[63] L Jost ldquoEntropy and diversityrdquo Oikos vol 113 no 2 pp 363ndash375 2006

[64] K-B Duan and S S Keerthi ldquoWhich is the bestmulticlass SVMmethod An empirical studyrdquo inMultiple Classifier Systems 6thInternational Workshop MCS 2005 Seaside CA USA June 13ndash15 2005 Proceedings vol 3541 of Lecture Notes in ComputerScience pp 278ndash285 Springer Berlin Germany 2005

[65] C Hsu and C Lin ldquoA comparison of methods for multi-class support vector machinesrdquo IEEE Transactions on NeuralNetworks vol 13 no 2 pp 415ndash425 2002

[66] A A Safi Towards computer-aided diagnosis of pigmented skinlesionson [PhD thesis] Technische Universitat Munchen 2012

[67] C-W Hsu C-C Chang and C-J Lin ldquoA practical guide tosupport vector classificationrdquo Tech Rep National Taiwan Uni-versity 2003 httpwwwcsientuedutwsimcjlinpapersguideguidepdf

[68] A M Molinaro R Simon and R M Pfeiffer ldquoPrediction errorestimation a comparison of resamplingmethodsrdquo Bioinformat-ics vol 21 no 15 pp 3301ndash3307 2005

[69] R Bouckaert ldquoChoosing between two learning algorithmsbased on calibrated testsrdquo in Proceedings of the 20th Interna-tional Conference on Machine Learning (ICML rsquo03) T Fawcettand N Mishra Eds pp 51ndash58 AAAI Press Washington DCUSA 2003

[70] P Flach Machine Learning The Art and Science of AlgorithmsThat Make Sense of Data Cambridge University Press NewYork NY USA 2012

[71] P Rubegni G Cevenini M Burroni et al ldquoAutomated diagno-sis of pigmented skin lesionsrdquo International Journal of Cancervol 101 no 6 pp 576ndash580 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 14: Research Article Automatic Classification of Specific ...downloads.hindawi.com/journals/bmri/2016/8934242.pdfmost popular melanocytic nevi to the diagnostic categories including Clark

14 BioMed Research International

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(a) Logit (AUC = 09469)

0

02

04

06

08

1

TPR

02 04 06 08 101 minus TNR

(b) Tree (AUC = 09555)

0

02

04

06

08

1

TPR

104 06 080201 minus TNR

(c) 3NN (AUC = 09152)

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(d) 5NN (AUC = 08746)

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(e) 10NN (AUC = 07860)

02 04 06 08 101 minus TNR

0

02

04

06

08

1

TPR

(f) 15NN (AUC = 07378)

02 04 06 080 11 minus TNR

0

02

04

06

08

1

TPR

(g) SVM (AUC = 09688)

Figure 12 ROC plots for individual classifiers for a malignant-or-benign classification

BioMed Research International 15

Table 2 The assessment of classifiers for a 1-vs-all and overall classification

Measure Logit Tree 3NN 5NN 10NN 15NN SVMTPRBN 09730 08485 07200 07660 06939 06735 10000TNRBN 10000 09266 10000 10000 09785 09677 10000AUCBN 10000 09675 09914 09889 09679 09659 10000TPRCN 08611 06977 08519 07586 08148 08182 09143TNRCN 09528 09394 08870 08761 08783 08500 09626AUCCN 09817 09279 09615 09089 09099 09030 09851TPRMM 08387 09655 07419 06786 06071 05185 08611TNRMM 09189 09381 08919 08596 08421 08174 09623AUCMM 09469 09555 09152 08746 07860 07378 09688TPRSN 08421 07568 08235 07105 06053 05227 09429TNRSN 09712 09333 09352 09231 08846 08776 09813AUCSN 09648 09425 09606 09395 08812 08545 09789ACCALL 08803 08028 07746 07324 06761 06197 09296BN blue nevus CN Clark nevus MM malignant melanoma SN Spitz nevus ALL multiclass classification

the presented method of classification yields better resultsthan those obtained during a medical examination

4 Conclusions

This paper presents a computer-aided approach for theautomatic classification of melanocytic lesions and accuratediagnosis of melanoma In our research we evaluated ourapproach with four types of classifiers 119896-nearest neighborsalgorithm logistic regression decision tree and SVM Thedeveloped system achieved 92 accuracy with SVM

Our study gives an important contribution to the researcharea of skin lesion classification for several reasons Firstly itfocuses on specifying the exact type of a skin lesion and notonly on categorization of skin lesions as benign or malignantSecondly this work combines the results of research doneso far related to all the steps needed for the developmentof an automatic diagnostic system for melanocytic lesiondetection and classification Finally the refinement of currentapproaches and development of new techniques andmethodswill help to improve the ability to diagnose skin moles moreprecisely and to achieve the goal of the significant reductionin melanoma mortality rate

41 Future Work Starting from the present frameworkfurther research efforts will be firstly addressed to compareand integrate the very promising approaches and corre-sponding feature descriptors reported in the most recentliterature in order to improve the classification accuracy ofmelanocytic lesions We will also conduct a follow-up studyby collecting more real data especially melanoma cases tofurther evaluate our approach One of the major problems inthe field of melanocytic skin tumors is the underdiagnosisof melanoma as a benign melanocytic or nonmelanocyticlesion To decrease the amount of dermoscopic pitfalls thenumber of melanocytic lesion types will be extended for abetter evaluation of melanocytic lesion

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This scientific work was partly supported by the National Sci-ence Centre based on the Decision no 201101NST706783

References

[1] Skin September 2015 httpsciencenationalgeographiccomsciencehealth-and-human-bodyhuman-bodyskin-article

[2] G Argenziano and H P Soyer Interactive Atlas of DermoscopyBook and CDWeb Resource Edra Medical Publishing and NewMedia Milan Itlay 2000

[3] K Korotkov and R Garcia ldquoComputerized analysis of pig-mented skin lesions a reviewrdquoArtificial Intelligence inMedicinevol 56 no 2 pp 69ndash90 2012

[4] P Soyer G Argenziano R Hofmann-Wellenhof and RH JohrColor Atlas of Melanocytic Lesions of the Skin Springer NewYork NY USA 2007

[5] Melanoma Foundation of New Zealand About melanomakey information 2014 httpwwwmelanomanetworkconzAbout-MelanomaKey-Information

[6] E Stocketh T Rosen and S Shumack Managing Skin CancerSpringer 2010

[7] S W Menzies L Bischof H Talbot et al ldquoThe performanceof SolarScan an automated dermoscopy image analysis instru-ment for the diagnosis of primary melanomardquo Archives ofDermatology vol 141 no 11 pp 1388ndash1396 2005

[8] M Burroni R CoronaGDellrsquoEva et al ldquoMelanoma computer-aided diagnosis reliability and feasibility studyrdquoClinical CancerResearch vol 10 no 6 pp 1881ndash1886 2004

[9] A Masood and A A Al-Jumaily ldquoComputer aided diagnosticsupport system for skin cancer a review of techniques andalgorithmsrdquo International Journal of Biomedical Imaging vol2013 Article ID 323268 22 pages 2013

16 BioMed Research International

[10] M E Celebi W V Stoecker and R H Moss ldquoAdvances inskin cancer image analysisrdquo ComputerizedMedical Imaging andGraphics vol 35 no 2 pp 83ndash84 2011

[11] H Zhou M Chen R Gass et al ldquoFeature-preserving artifactremoval from dermoscopy imagesrdquo in Medical Imaging ImageProcessing vol 6914 of Proceedings of SPIE pp 1ndash9 San DiegoCalif USA February 2008

[12] P Wighton T K Lee and M S Atkins ldquoDermascopic hairdisocclusion using inpaintingrdquo in Medical Imaging Image Pro-cessing vol 6914 ofProceedings of SPIE pp 1ndash8 SanDiego CalifUSA February 2008

[13] Q Abbas I F Garcia M Emre Celebi and W Ahmad ldquoAfeature-preserving hair removal algorithm for dermoscopyimagesrdquo Skin Research and Technology vol 19 no 1 pp e27ndashe36 2013

[14] J Jaworek-Korjakowska ldquoNovel method for border irregularityassessment in dermoscopic color imagesrdquo Computational andMathematicalMethods inMedicine vol 2015 Article ID 49620211 pages 2015

[15] J Jaworek-Korjakowska and R Tadeusiewicz ldquoHair removalfrom dermoscopic color imagesrdquo Bio-Algorithms and Med-Systems vol 9 no 2 pp 53ndash58 2013

[16] M E Celebi H Iyatomi G Schaefer and W V StoeckerldquoLesion border detection in dermoscopy imagesrdquoComputerizedMedical Imaging and Graphics vol 33 no 2 pp 148ndash153 2009

[17] J Jaworek-Korjakowska Analiza i detekcja struktur lokalnych wczerniaku złosliwym [Autoreferat Rozprawy Doktorskiej] AGHUniversity of Science and Technology Krakow Poland 2013

[18] E Zagrouba and W Barhoumi ldquoA prelimary approach for theautomated recognition ofmalignantmelanomardquo ImageAnalysisamp Stereology vol 23 no 2 pp 121ndash135 2004

[19] D Ballardand and C Brown Computer Vision Prentice HallProfessional Technical Reference 1st edition 1982

[20] R M Haralick ldquoA measure for circularity of digital figuresrdquoIEEE Transactions on Systems Man and Cybernetics vol 4 no4 pp 394ndash396 1974

[21] R S Montero and E Bribiesca ldquoState of the art of compactnessand circularity measuresrdquo International Mathematical Forumvol 4 no 27 pp 1305ndash1335 2009

[22] J Smietanski R Tadeusiewicz and E Łuczynska ldquoTextureanalysis in perfusion images of prostate cancermdasha case studyrdquoInternational Journal of Applied Mathematics and ComputerScience vol 20 no 1 pp 149ndash156 2010

[23] J W Choi Y W Park S Y Byun and S W Youn ldquoDifferentia-tion of benign pigmented skin lesions with the aid of computerimage analysis a novel approachrdquo Annals of Dermatology vol25 no 3 pp 340ndash347 2013

[24] M E Celebi H A Kingravi B Uddin et al ldquoA methodologicalapproach to the classification of dermoscopy imagesrdquo Comput-erized Medical Imaging and Graphics vol 31 no 6 pp 362ndash3732007

[25] C Manning P Raghavan and H Schutze Introduction toInformation Retrieval Cambridge University Press 2008

[26] J Shawe-Taylor and N Cristianini Kernel Methods for PatternAnalysis Cambridge University Press Cambridge UK 2004

[27] L Kaufman and P Rousseeuw Finding Groups in Data AnIntroduction to Cluster Analysis Wiley Hoboken NJ USA1990

[28] R M Haralick K Shanmugam and I Dinstein ldquoTexturalfeatures for image classificationrdquo IEEE Transactions on SystemsMan and Cybernetics vol 3 no 6 pp 610ndash621 1973

[29] MM Galloway ldquoTexture analysis using gray level run lengthsrdquoComputer Graphics and Image Processing vol 4 no 2 pp 172ndash179 1975

[30] A Chu C M Sehgal and J F Greenleaf ldquoUse of gray valuedistribution of run lengths for texture analysisrdquo Pattern Recog-nition Letters vol 11 no 6 pp 415ndash419 1990

[31] B V Dasarathy and E B Holder ldquoImage characterizationsbased on joint gray level-run length distributionsrdquo PatternRecognition Letters vol 12 no 8 pp 497ndash502 1991

[32] H Ganster A Pinz R Rohrer E Wildling M Binder and HKittler ldquoAutomated melanoma recognitionrdquo IEEE Transactionson Medical Imaging vol 20 no 3 pp 233ndash239 2001

[33] M DrsquoAmico M Ferri and I Stanganelli ldquoQualitative asym-metry measure for melanoma detectionrdquo in Proceedings of theIEEE International Symposium on Biomedical Imaging Nano toMacro vol 2 pp 1155ndash1158 Arlington Va USA April 2004

[34] W V Stoecker W W Li and R H Moss ldquoAutomatic detectionof asymmetry in skin tumorsrdquo Computerized Medical Imagingand Graphics vol 16 no 3 pp 191ndash197 1992

[35] U Fayyad and K Irani ldquoMulti-interval discretization ofcontinuous-valued attributes for classification learningrdquo inProceedings of the 13th International Joint Conference on ArticialIntelligence vol 2 pp 1022ndash1029 Morgan Kaufmann 1993

[36] S Aksoy and R M Haralick ldquoFeature normalization andlikelihood-based similarity measures for image retrievalrdquo Pat-tern Recognition Letters vol 22 no 5 pp 563ndash582 2001

[37] V Ng and D Cheung ldquoMeasuring asymmetries of skin lesionsrdquoin Proceedings of IEEE International Conference on SystemsMan and Cybernetics vol 5 pp 4211ndash4216 IEEE PressOrlando Fla USA October 1997

[38] Z She Y Liu and A Damatoa ldquoCombination of features fromskin pattern and ABCD analysis for lesion classificationrdquo SkinResearch and Technology vol 13 no 1 pp 25ndash33 2007

[39] B Kusumoputro and A Ariyanto ldquoNeural network diagnosisof malignant skin cancers using principal component analysisas a preprocessorrdquo in Proceedings of the IEEE International JointConference on Neural Networks and the IEEE World Congresson Computational Intelligence vol 1 pp 310ndash315 IEEE PressAnchorage Alaska USA May 1998

[40] E Claridge P N Hall M Keefe and J P Allen ldquoShape analysisfor classification ofmalignantmelanomardquo Journal of BiomedicalEngineering vol 14 no 3 pp 229ndash234 1992

[41] Y Cheng R Swamisai S E Umbaugh et al ldquoSkin lesionclassification using relative color featuresrdquo Skin Research andTechnology vol 14 no 1 pp 53ndash64 2008

[42] K Cheung Image processing for skin cancer detection malignantmelanoma recognition [PhD thesis] University of Toronto 1997

[43] T K Lee D I McLean and M S Atkins ldquoIrregularity indexa new border irregularity measure for cutaneous melanocyticlesionsrdquoMedical Image Analysis vol 7 no 1 pp 47ndash64 2003

[44] T K Lee and E Claridge ldquoPredictive power of irregular bordershapes for malignant melanomasrdquo Skin Research and Technol-ogy vol 11 no 1 pp 1ndash8 2005

[45] Y Chang R J Stanley R H Moss andW Van Stoecker ldquoA sys-tematic heuristic approach for feature selection for melanomadiscrimination using clinical imagesrdquo Skin Research and Tech-nology vol 11 no 3 pp 165ndash178 2005

[46] A J Round AW G Duller and P J Fish ldquoLesion classificationusing skin patterningrdquo Skin Research and Technology vol 6 no4 pp 183ndash192 2000

BioMed Research International 17

[47] Z She and P Fish ldquoAnalysis of skin line pattern for lesionclassificationrdquo Skin Research and Technology vol 9 no 1 pp73ndash80 2003

[48] R P Walvick K Patel S V Patwardhan and A P DhawanldquoClassification of melanoma using wavelet-transform-basedoptimal feature setrdquo in Proceedings of the Medical ImagingImage Processing vol 5370 of Proceedings of SPIE pp 944ndash951May 2004

[49] W V Stoecker C-S Chiang and R H Moss ldquoTexture in skinimages comparison of three methods to determine smooth-nessrdquo Computerized Medical Imaging and Graphics vol 16 no3 pp 179ndash190 1992

[50] S V Deshabhoina S E Umbaugh W V Stoecker R H Mossand S K Srinivasan ldquoMelanoma and seborrheic keratosisdifferentiation using texture featuresrdquo Skin Research and Tech-nology vol 9 no 4 pp 348ndash356 2003

[51] G M Weiss ldquoMining with rarity a unifying frameworkrdquo ACMSIGKDD Explorations Newsletter vol 6 no 1 pp 7ndash19 2004

[52] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

[53] H Liu and L Yu ldquoToward integrating feature selection algo-rithms for classification and clusteringrdquo IEEE Transactions onKnowledge and Data Engineering vol 17 no 4 pp 491ndash5022005

[54] I T Jolliffe Principal Component Analysis Springer New YorkNY USA 1986

[55] I Kononenko and E Simec ldquoInduction of decision treesusing RELIEFFrdquo in Proceedings of the ISSEK94 Workshop onMathematical and Statistical Methods in Artificial Intelligencevol 363 of International Centre forMechanical Sciences pp 199ndash220 Springer Vienna Austria 1995

[56] R Battiti ldquoUsing mutual information for selecting features insupervised neural net learningrdquo IEEE Transactions on NeuralNetworks vol 5 no 4 pp 537ndash550 1994

[57] M A Hall ldquoCorrelation-based feature selection for discreteand numeric class machine learningrdquo in Proceedings of the 17thInternational Conference on Machine Learning (ICML rsquo00) pp359ndash366 Morgan Kaufmann 2000

[58] E E Ghiselli Theory of Psychological Measurement McGraw-Hill New York NY USA 1964

[59] I H Witten and E Frank Data Mining Practical MachineLearning Tools and Techniques Morgan Kaufmann San Fran-cisco Calif USA 2005

[60] S Dreiseitl L Ohno-Machado H Kittler S Vinterbo HBillhardt and M Binder ldquoA comparison of machine learningmethods for the diagnosis of pigmented skin lesionsrdquo Journal ofBiomedical Informatics vol 34 no 1 pp 28ndash36 2001

[61] A Glowacz A Glowacz and P Korohoda ldquoRecognition ofmonochrome thermal images of synchronous motor with theapplication of binarization and nearestmean classifierrdquoArchivesof Metallurgy and Materials vol 59 no 1 pp 31ndash34 2014

[62] R Tadeusiewicz How Intelligent Should be System for ImageAnalysisvol 339 of Studies in Computational IntelligenceSpringer Berlin Germany 2011

[63] L Jost ldquoEntropy and diversityrdquo Oikos vol 113 no 2 pp 363ndash375 2006

[64] K-B Duan and S S Keerthi ldquoWhich is the bestmulticlass SVMmethod An empirical studyrdquo inMultiple Classifier Systems 6thInternational Workshop MCS 2005 Seaside CA USA June 13ndash15 2005 Proceedings vol 3541 of Lecture Notes in ComputerScience pp 278ndash285 Springer Berlin Germany 2005

[65] C Hsu and C Lin ldquoA comparison of methods for multi-class support vector machinesrdquo IEEE Transactions on NeuralNetworks vol 13 no 2 pp 415ndash425 2002

[66] A A Safi Towards computer-aided diagnosis of pigmented skinlesionson [PhD thesis] Technische Universitat Munchen 2012

[67] C-W Hsu C-C Chang and C-J Lin ldquoA practical guide tosupport vector classificationrdquo Tech Rep National Taiwan Uni-versity 2003 httpwwwcsientuedutwsimcjlinpapersguideguidepdf

[68] A M Molinaro R Simon and R M Pfeiffer ldquoPrediction errorestimation a comparison of resamplingmethodsrdquo Bioinformat-ics vol 21 no 15 pp 3301ndash3307 2005

[69] R Bouckaert ldquoChoosing between two learning algorithmsbased on calibrated testsrdquo in Proceedings of the 20th Interna-tional Conference on Machine Learning (ICML rsquo03) T Fawcettand N Mishra Eds pp 51ndash58 AAAI Press Washington DCUSA 2003

[70] P Flach Machine Learning The Art and Science of AlgorithmsThat Make Sense of Data Cambridge University Press NewYork NY USA 2012

[71] P Rubegni G Cevenini M Burroni et al ldquoAutomated diagno-sis of pigmented skin lesionsrdquo International Journal of Cancervol 101 no 6 pp 576ndash580 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 15: Research Article Automatic Classification of Specific ...downloads.hindawi.com/journals/bmri/2016/8934242.pdfmost popular melanocytic nevi to the diagnostic categories including Clark

BioMed Research International 15

Table 2 The assessment of classifiers for a 1-vs-all and overall classification

Measure Logit Tree 3NN 5NN 10NN 15NN SVMTPRBN 09730 08485 07200 07660 06939 06735 10000TNRBN 10000 09266 10000 10000 09785 09677 10000AUCBN 10000 09675 09914 09889 09679 09659 10000TPRCN 08611 06977 08519 07586 08148 08182 09143TNRCN 09528 09394 08870 08761 08783 08500 09626AUCCN 09817 09279 09615 09089 09099 09030 09851TPRMM 08387 09655 07419 06786 06071 05185 08611TNRMM 09189 09381 08919 08596 08421 08174 09623AUCMM 09469 09555 09152 08746 07860 07378 09688TPRSN 08421 07568 08235 07105 06053 05227 09429TNRSN 09712 09333 09352 09231 08846 08776 09813AUCSN 09648 09425 09606 09395 08812 08545 09789ACCALL 08803 08028 07746 07324 06761 06197 09296BN blue nevus CN Clark nevus MM malignant melanoma SN Spitz nevus ALL multiclass classification

the presented method of classification yields better resultsthan those obtained during a medical examination

4 Conclusions

This paper presents a computer-aided approach for theautomatic classification of melanocytic lesions and accuratediagnosis of melanoma In our research we evaluated ourapproach with four types of classifiers 119896-nearest neighborsalgorithm logistic regression decision tree and SVM Thedeveloped system achieved 92 accuracy with SVM

Our study gives an important contribution to the researcharea of skin lesion classification for several reasons Firstly itfocuses on specifying the exact type of a skin lesion and notonly on categorization of skin lesions as benign or malignantSecondly this work combines the results of research doneso far related to all the steps needed for the developmentof an automatic diagnostic system for melanocytic lesiondetection and classification Finally the refinement of currentapproaches and development of new techniques andmethodswill help to improve the ability to diagnose skin moles moreprecisely and to achieve the goal of the significant reductionin melanoma mortality rate

41 Future Work Starting from the present frameworkfurther research efforts will be firstly addressed to compareand integrate the very promising approaches and corre-sponding feature descriptors reported in the most recentliterature in order to improve the classification accuracy ofmelanocytic lesions We will also conduct a follow-up studyby collecting more real data especially melanoma cases tofurther evaluate our approach One of the major problems inthe field of melanocytic skin tumors is the underdiagnosisof melanoma as a benign melanocytic or nonmelanocyticlesion To decrease the amount of dermoscopic pitfalls thenumber of melanocytic lesion types will be extended for abetter evaluation of melanocytic lesion

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This scientific work was partly supported by the National Sci-ence Centre based on the Decision no 201101NST706783

References

[1] Skin September 2015 httpsciencenationalgeographiccomsciencehealth-and-human-bodyhuman-bodyskin-article

[2] G Argenziano and H P Soyer Interactive Atlas of DermoscopyBook and CDWeb Resource Edra Medical Publishing and NewMedia Milan Itlay 2000

[3] K Korotkov and R Garcia ldquoComputerized analysis of pig-mented skin lesions a reviewrdquoArtificial Intelligence inMedicinevol 56 no 2 pp 69ndash90 2012

[4] P Soyer G Argenziano R Hofmann-Wellenhof and RH JohrColor Atlas of Melanocytic Lesions of the Skin Springer NewYork NY USA 2007

[5] Melanoma Foundation of New Zealand About melanomakey information 2014 httpwwwmelanomanetworkconzAbout-MelanomaKey-Information

[6] E Stocketh T Rosen and S Shumack Managing Skin CancerSpringer 2010

[7] S W Menzies L Bischof H Talbot et al ldquoThe performanceof SolarScan an automated dermoscopy image analysis instru-ment for the diagnosis of primary melanomardquo Archives ofDermatology vol 141 no 11 pp 1388ndash1396 2005

[8] M Burroni R CoronaGDellrsquoEva et al ldquoMelanoma computer-aided diagnosis reliability and feasibility studyrdquoClinical CancerResearch vol 10 no 6 pp 1881ndash1886 2004

[9] A Masood and A A Al-Jumaily ldquoComputer aided diagnosticsupport system for skin cancer a review of techniques andalgorithmsrdquo International Journal of Biomedical Imaging vol2013 Article ID 323268 22 pages 2013

16 BioMed Research International

[10] M E Celebi W V Stoecker and R H Moss ldquoAdvances inskin cancer image analysisrdquo ComputerizedMedical Imaging andGraphics vol 35 no 2 pp 83ndash84 2011

[11] H Zhou M Chen R Gass et al ldquoFeature-preserving artifactremoval from dermoscopy imagesrdquo in Medical Imaging ImageProcessing vol 6914 of Proceedings of SPIE pp 1ndash9 San DiegoCalif USA February 2008

[12] P Wighton T K Lee and M S Atkins ldquoDermascopic hairdisocclusion using inpaintingrdquo in Medical Imaging Image Pro-cessing vol 6914 ofProceedings of SPIE pp 1ndash8 SanDiego CalifUSA February 2008

[13] Q Abbas I F Garcia M Emre Celebi and W Ahmad ldquoAfeature-preserving hair removal algorithm for dermoscopyimagesrdquo Skin Research and Technology vol 19 no 1 pp e27ndashe36 2013

[14] J Jaworek-Korjakowska ldquoNovel method for border irregularityassessment in dermoscopic color imagesrdquo Computational andMathematicalMethods inMedicine vol 2015 Article ID 49620211 pages 2015

[15] J Jaworek-Korjakowska and R Tadeusiewicz ldquoHair removalfrom dermoscopic color imagesrdquo Bio-Algorithms and Med-Systems vol 9 no 2 pp 53ndash58 2013

[16] M E Celebi H Iyatomi G Schaefer and W V StoeckerldquoLesion border detection in dermoscopy imagesrdquoComputerizedMedical Imaging and Graphics vol 33 no 2 pp 148ndash153 2009

[17] J Jaworek-Korjakowska Analiza i detekcja struktur lokalnych wczerniaku złosliwym [Autoreferat Rozprawy Doktorskiej] AGHUniversity of Science and Technology Krakow Poland 2013

[18] E Zagrouba and W Barhoumi ldquoA prelimary approach for theautomated recognition ofmalignantmelanomardquo ImageAnalysisamp Stereology vol 23 no 2 pp 121ndash135 2004

[19] D Ballardand and C Brown Computer Vision Prentice HallProfessional Technical Reference 1st edition 1982

[20] R M Haralick ldquoA measure for circularity of digital figuresrdquoIEEE Transactions on Systems Man and Cybernetics vol 4 no4 pp 394ndash396 1974

[21] R S Montero and E Bribiesca ldquoState of the art of compactnessand circularity measuresrdquo International Mathematical Forumvol 4 no 27 pp 1305ndash1335 2009

[22] J Smietanski R Tadeusiewicz and E Łuczynska ldquoTextureanalysis in perfusion images of prostate cancermdasha case studyrdquoInternational Journal of Applied Mathematics and ComputerScience vol 20 no 1 pp 149ndash156 2010

[23] J W Choi Y W Park S Y Byun and S W Youn ldquoDifferentia-tion of benign pigmented skin lesions with the aid of computerimage analysis a novel approachrdquo Annals of Dermatology vol25 no 3 pp 340ndash347 2013

[24] M E Celebi H A Kingravi B Uddin et al ldquoA methodologicalapproach to the classification of dermoscopy imagesrdquo Comput-erized Medical Imaging and Graphics vol 31 no 6 pp 362ndash3732007

[25] C Manning P Raghavan and H Schutze Introduction toInformation Retrieval Cambridge University Press 2008

[26] J Shawe-Taylor and N Cristianini Kernel Methods for PatternAnalysis Cambridge University Press Cambridge UK 2004

[27] L Kaufman and P Rousseeuw Finding Groups in Data AnIntroduction to Cluster Analysis Wiley Hoboken NJ USA1990

[28] R M Haralick K Shanmugam and I Dinstein ldquoTexturalfeatures for image classificationrdquo IEEE Transactions on SystemsMan and Cybernetics vol 3 no 6 pp 610ndash621 1973

[29] MM Galloway ldquoTexture analysis using gray level run lengthsrdquoComputer Graphics and Image Processing vol 4 no 2 pp 172ndash179 1975

[30] A Chu C M Sehgal and J F Greenleaf ldquoUse of gray valuedistribution of run lengths for texture analysisrdquo Pattern Recog-nition Letters vol 11 no 6 pp 415ndash419 1990

[31] B V Dasarathy and E B Holder ldquoImage characterizationsbased on joint gray level-run length distributionsrdquo PatternRecognition Letters vol 12 no 8 pp 497ndash502 1991

[32] H Ganster A Pinz R Rohrer E Wildling M Binder and HKittler ldquoAutomated melanoma recognitionrdquo IEEE Transactionson Medical Imaging vol 20 no 3 pp 233ndash239 2001

[33] M DrsquoAmico M Ferri and I Stanganelli ldquoQualitative asym-metry measure for melanoma detectionrdquo in Proceedings of theIEEE International Symposium on Biomedical Imaging Nano toMacro vol 2 pp 1155ndash1158 Arlington Va USA April 2004

[34] W V Stoecker W W Li and R H Moss ldquoAutomatic detectionof asymmetry in skin tumorsrdquo Computerized Medical Imagingand Graphics vol 16 no 3 pp 191ndash197 1992

[35] U Fayyad and K Irani ldquoMulti-interval discretization ofcontinuous-valued attributes for classification learningrdquo inProceedings of the 13th International Joint Conference on ArticialIntelligence vol 2 pp 1022ndash1029 Morgan Kaufmann 1993

[36] S Aksoy and R M Haralick ldquoFeature normalization andlikelihood-based similarity measures for image retrievalrdquo Pat-tern Recognition Letters vol 22 no 5 pp 563ndash582 2001

[37] V Ng and D Cheung ldquoMeasuring asymmetries of skin lesionsrdquoin Proceedings of IEEE International Conference on SystemsMan and Cybernetics vol 5 pp 4211ndash4216 IEEE PressOrlando Fla USA October 1997

[38] Z She Y Liu and A Damatoa ldquoCombination of features fromskin pattern and ABCD analysis for lesion classificationrdquo SkinResearch and Technology vol 13 no 1 pp 25ndash33 2007

[39] B Kusumoputro and A Ariyanto ldquoNeural network diagnosisof malignant skin cancers using principal component analysisas a preprocessorrdquo in Proceedings of the IEEE International JointConference on Neural Networks and the IEEE World Congresson Computational Intelligence vol 1 pp 310ndash315 IEEE PressAnchorage Alaska USA May 1998

[40] E Claridge P N Hall M Keefe and J P Allen ldquoShape analysisfor classification ofmalignantmelanomardquo Journal of BiomedicalEngineering vol 14 no 3 pp 229ndash234 1992

[41] Y Cheng R Swamisai S E Umbaugh et al ldquoSkin lesionclassification using relative color featuresrdquo Skin Research andTechnology vol 14 no 1 pp 53ndash64 2008

[42] K Cheung Image processing for skin cancer detection malignantmelanoma recognition [PhD thesis] University of Toronto 1997

[43] T K Lee D I McLean and M S Atkins ldquoIrregularity indexa new border irregularity measure for cutaneous melanocyticlesionsrdquoMedical Image Analysis vol 7 no 1 pp 47ndash64 2003

[44] T K Lee and E Claridge ldquoPredictive power of irregular bordershapes for malignant melanomasrdquo Skin Research and Technol-ogy vol 11 no 1 pp 1ndash8 2005

[45] Y Chang R J Stanley R H Moss andW Van Stoecker ldquoA sys-tematic heuristic approach for feature selection for melanomadiscrimination using clinical imagesrdquo Skin Research and Tech-nology vol 11 no 3 pp 165ndash178 2005

[46] A J Round AW G Duller and P J Fish ldquoLesion classificationusing skin patterningrdquo Skin Research and Technology vol 6 no4 pp 183ndash192 2000

BioMed Research International 17

[47] Z She and P Fish ldquoAnalysis of skin line pattern for lesionclassificationrdquo Skin Research and Technology vol 9 no 1 pp73ndash80 2003

[48] R P Walvick K Patel S V Patwardhan and A P DhawanldquoClassification of melanoma using wavelet-transform-basedoptimal feature setrdquo in Proceedings of the Medical ImagingImage Processing vol 5370 of Proceedings of SPIE pp 944ndash951May 2004

[49] W V Stoecker C-S Chiang and R H Moss ldquoTexture in skinimages comparison of three methods to determine smooth-nessrdquo Computerized Medical Imaging and Graphics vol 16 no3 pp 179ndash190 1992

[50] S V Deshabhoina S E Umbaugh W V Stoecker R H Mossand S K Srinivasan ldquoMelanoma and seborrheic keratosisdifferentiation using texture featuresrdquo Skin Research and Tech-nology vol 9 no 4 pp 348ndash356 2003

[51] G M Weiss ldquoMining with rarity a unifying frameworkrdquo ACMSIGKDD Explorations Newsletter vol 6 no 1 pp 7ndash19 2004

[52] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

[53] H Liu and L Yu ldquoToward integrating feature selection algo-rithms for classification and clusteringrdquo IEEE Transactions onKnowledge and Data Engineering vol 17 no 4 pp 491ndash5022005

[54] I T Jolliffe Principal Component Analysis Springer New YorkNY USA 1986

[55] I Kononenko and E Simec ldquoInduction of decision treesusing RELIEFFrdquo in Proceedings of the ISSEK94 Workshop onMathematical and Statistical Methods in Artificial Intelligencevol 363 of International Centre forMechanical Sciences pp 199ndash220 Springer Vienna Austria 1995

[56] R Battiti ldquoUsing mutual information for selecting features insupervised neural net learningrdquo IEEE Transactions on NeuralNetworks vol 5 no 4 pp 537ndash550 1994

[57] M A Hall ldquoCorrelation-based feature selection for discreteand numeric class machine learningrdquo in Proceedings of the 17thInternational Conference on Machine Learning (ICML rsquo00) pp359ndash366 Morgan Kaufmann 2000

[58] E E Ghiselli Theory of Psychological Measurement McGraw-Hill New York NY USA 1964

[59] I H Witten and E Frank Data Mining Practical MachineLearning Tools and Techniques Morgan Kaufmann San Fran-cisco Calif USA 2005

[60] S Dreiseitl L Ohno-Machado H Kittler S Vinterbo HBillhardt and M Binder ldquoA comparison of machine learningmethods for the diagnosis of pigmented skin lesionsrdquo Journal ofBiomedical Informatics vol 34 no 1 pp 28ndash36 2001

[61] A Glowacz A Glowacz and P Korohoda ldquoRecognition ofmonochrome thermal images of synchronous motor with theapplication of binarization and nearestmean classifierrdquoArchivesof Metallurgy and Materials vol 59 no 1 pp 31ndash34 2014

[62] R Tadeusiewicz How Intelligent Should be System for ImageAnalysisvol 339 of Studies in Computational IntelligenceSpringer Berlin Germany 2011

[63] L Jost ldquoEntropy and diversityrdquo Oikos vol 113 no 2 pp 363ndash375 2006

[64] K-B Duan and S S Keerthi ldquoWhich is the bestmulticlass SVMmethod An empirical studyrdquo inMultiple Classifier Systems 6thInternational Workshop MCS 2005 Seaside CA USA June 13ndash15 2005 Proceedings vol 3541 of Lecture Notes in ComputerScience pp 278ndash285 Springer Berlin Germany 2005

[65] C Hsu and C Lin ldquoA comparison of methods for multi-class support vector machinesrdquo IEEE Transactions on NeuralNetworks vol 13 no 2 pp 415ndash425 2002

[66] A A Safi Towards computer-aided diagnosis of pigmented skinlesionson [PhD thesis] Technische Universitat Munchen 2012

[67] C-W Hsu C-C Chang and C-J Lin ldquoA practical guide tosupport vector classificationrdquo Tech Rep National Taiwan Uni-versity 2003 httpwwwcsientuedutwsimcjlinpapersguideguidepdf

[68] A M Molinaro R Simon and R M Pfeiffer ldquoPrediction errorestimation a comparison of resamplingmethodsrdquo Bioinformat-ics vol 21 no 15 pp 3301ndash3307 2005

[69] R Bouckaert ldquoChoosing between two learning algorithmsbased on calibrated testsrdquo in Proceedings of the 20th Interna-tional Conference on Machine Learning (ICML rsquo03) T Fawcettand N Mishra Eds pp 51ndash58 AAAI Press Washington DCUSA 2003

[70] P Flach Machine Learning The Art and Science of AlgorithmsThat Make Sense of Data Cambridge University Press NewYork NY USA 2012

[71] P Rubegni G Cevenini M Burroni et al ldquoAutomated diagno-sis of pigmented skin lesionsrdquo International Journal of Cancervol 101 no 6 pp 576ndash580 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 16: Research Article Automatic Classification of Specific ...downloads.hindawi.com/journals/bmri/2016/8934242.pdfmost popular melanocytic nevi to the diagnostic categories including Clark

16 BioMed Research International

[10] M E Celebi W V Stoecker and R H Moss ldquoAdvances inskin cancer image analysisrdquo ComputerizedMedical Imaging andGraphics vol 35 no 2 pp 83ndash84 2011

[11] H Zhou M Chen R Gass et al ldquoFeature-preserving artifactremoval from dermoscopy imagesrdquo in Medical Imaging ImageProcessing vol 6914 of Proceedings of SPIE pp 1ndash9 San DiegoCalif USA February 2008

[12] P Wighton T K Lee and M S Atkins ldquoDermascopic hairdisocclusion using inpaintingrdquo in Medical Imaging Image Pro-cessing vol 6914 ofProceedings of SPIE pp 1ndash8 SanDiego CalifUSA February 2008

[13] Q Abbas I F Garcia M Emre Celebi and W Ahmad ldquoAfeature-preserving hair removal algorithm for dermoscopyimagesrdquo Skin Research and Technology vol 19 no 1 pp e27ndashe36 2013

[14] J Jaworek-Korjakowska ldquoNovel method for border irregularityassessment in dermoscopic color imagesrdquo Computational andMathematicalMethods inMedicine vol 2015 Article ID 49620211 pages 2015

[15] J Jaworek-Korjakowska and R Tadeusiewicz ldquoHair removalfrom dermoscopic color imagesrdquo Bio-Algorithms and Med-Systems vol 9 no 2 pp 53ndash58 2013

[16] M E Celebi H Iyatomi G Schaefer and W V StoeckerldquoLesion border detection in dermoscopy imagesrdquoComputerizedMedical Imaging and Graphics vol 33 no 2 pp 148ndash153 2009

[17] J Jaworek-Korjakowska Analiza i detekcja struktur lokalnych wczerniaku złosliwym [Autoreferat Rozprawy Doktorskiej] AGHUniversity of Science and Technology Krakow Poland 2013

[18] E Zagrouba and W Barhoumi ldquoA prelimary approach for theautomated recognition ofmalignantmelanomardquo ImageAnalysisamp Stereology vol 23 no 2 pp 121ndash135 2004

[19] D Ballardand and C Brown Computer Vision Prentice HallProfessional Technical Reference 1st edition 1982

[20] R M Haralick ldquoA measure for circularity of digital figuresrdquoIEEE Transactions on Systems Man and Cybernetics vol 4 no4 pp 394ndash396 1974

[21] R S Montero and E Bribiesca ldquoState of the art of compactnessand circularity measuresrdquo International Mathematical Forumvol 4 no 27 pp 1305ndash1335 2009

[22] J Smietanski R Tadeusiewicz and E Łuczynska ldquoTextureanalysis in perfusion images of prostate cancermdasha case studyrdquoInternational Journal of Applied Mathematics and ComputerScience vol 20 no 1 pp 149ndash156 2010

[23] J W Choi Y W Park S Y Byun and S W Youn ldquoDifferentia-tion of benign pigmented skin lesions with the aid of computerimage analysis a novel approachrdquo Annals of Dermatology vol25 no 3 pp 340ndash347 2013

[24] M E Celebi H A Kingravi B Uddin et al ldquoA methodologicalapproach to the classification of dermoscopy imagesrdquo Comput-erized Medical Imaging and Graphics vol 31 no 6 pp 362ndash3732007

[25] C Manning P Raghavan and H Schutze Introduction toInformation Retrieval Cambridge University Press 2008

[26] J Shawe-Taylor and N Cristianini Kernel Methods for PatternAnalysis Cambridge University Press Cambridge UK 2004

[27] L Kaufman and P Rousseeuw Finding Groups in Data AnIntroduction to Cluster Analysis Wiley Hoboken NJ USA1990

[28] R M Haralick K Shanmugam and I Dinstein ldquoTexturalfeatures for image classificationrdquo IEEE Transactions on SystemsMan and Cybernetics vol 3 no 6 pp 610ndash621 1973

[29] MM Galloway ldquoTexture analysis using gray level run lengthsrdquoComputer Graphics and Image Processing vol 4 no 2 pp 172ndash179 1975

[30] A Chu C M Sehgal and J F Greenleaf ldquoUse of gray valuedistribution of run lengths for texture analysisrdquo Pattern Recog-nition Letters vol 11 no 6 pp 415ndash419 1990

[31] B V Dasarathy and E B Holder ldquoImage characterizationsbased on joint gray level-run length distributionsrdquo PatternRecognition Letters vol 12 no 8 pp 497ndash502 1991

[32] H Ganster A Pinz R Rohrer E Wildling M Binder and HKittler ldquoAutomated melanoma recognitionrdquo IEEE Transactionson Medical Imaging vol 20 no 3 pp 233ndash239 2001

[33] M DrsquoAmico M Ferri and I Stanganelli ldquoQualitative asym-metry measure for melanoma detectionrdquo in Proceedings of theIEEE International Symposium on Biomedical Imaging Nano toMacro vol 2 pp 1155ndash1158 Arlington Va USA April 2004

[34] W V Stoecker W W Li and R H Moss ldquoAutomatic detectionof asymmetry in skin tumorsrdquo Computerized Medical Imagingand Graphics vol 16 no 3 pp 191ndash197 1992

[35] U Fayyad and K Irani ldquoMulti-interval discretization ofcontinuous-valued attributes for classification learningrdquo inProceedings of the 13th International Joint Conference on ArticialIntelligence vol 2 pp 1022ndash1029 Morgan Kaufmann 1993

[36] S Aksoy and R M Haralick ldquoFeature normalization andlikelihood-based similarity measures for image retrievalrdquo Pat-tern Recognition Letters vol 22 no 5 pp 563ndash582 2001

[37] V Ng and D Cheung ldquoMeasuring asymmetries of skin lesionsrdquoin Proceedings of IEEE International Conference on SystemsMan and Cybernetics vol 5 pp 4211ndash4216 IEEE PressOrlando Fla USA October 1997

[38] Z She Y Liu and A Damatoa ldquoCombination of features fromskin pattern and ABCD analysis for lesion classificationrdquo SkinResearch and Technology vol 13 no 1 pp 25ndash33 2007

[39] B Kusumoputro and A Ariyanto ldquoNeural network diagnosisof malignant skin cancers using principal component analysisas a preprocessorrdquo in Proceedings of the IEEE International JointConference on Neural Networks and the IEEE World Congresson Computational Intelligence vol 1 pp 310ndash315 IEEE PressAnchorage Alaska USA May 1998

[40] E Claridge P N Hall M Keefe and J P Allen ldquoShape analysisfor classification ofmalignantmelanomardquo Journal of BiomedicalEngineering vol 14 no 3 pp 229ndash234 1992

[41] Y Cheng R Swamisai S E Umbaugh et al ldquoSkin lesionclassification using relative color featuresrdquo Skin Research andTechnology vol 14 no 1 pp 53ndash64 2008

[42] K Cheung Image processing for skin cancer detection malignantmelanoma recognition [PhD thesis] University of Toronto 1997

[43] T K Lee D I McLean and M S Atkins ldquoIrregularity indexa new border irregularity measure for cutaneous melanocyticlesionsrdquoMedical Image Analysis vol 7 no 1 pp 47ndash64 2003

[44] T K Lee and E Claridge ldquoPredictive power of irregular bordershapes for malignant melanomasrdquo Skin Research and Technol-ogy vol 11 no 1 pp 1ndash8 2005

[45] Y Chang R J Stanley R H Moss andW Van Stoecker ldquoA sys-tematic heuristic approach for feature selection for melanomadiscrimination using clinical imagesrdquo Skin Research and Tech-nology vol 11 no 3 pp 165ndash178 2005

[46] A J Round AW G Duller and P J Fish ldquoLesion classificationusing skin patterningrdquo Skin Research and Technology vol 6 no4 pp 183ndash192 2000

BioMed Research International 17

[47] Z She and P Fish ldquoAnalysis of skin line pattern for lesionclassificationrdquo Skin Research and Technology vol 9 no 1 pp73ndash80 2003

[48] R P Walvick K Patel S V Patwardhan and A P DhawanldquoClassification of melanoma using wavelet-transform-basedoptimal feature setrdquo in Proceedings of the Medical ImagingImage Processing vol 5370 of Proceedings of SPIE pp 944ndash951May 2004

[49] W V Stoecker C-S Chiang and R H Moss ldquoTexture in skinimages comparison of three methods to determine smooth-nessrdquo Computerized Medical Imaging and Graphics vol 16 no3 pp 179ndash190 1992

[50] S V Deshabhoina S E Umbaugh W V Stoecker R H Mossand S K Srinivasan ldquoMelanoma and seborrheic keratosisdifferentiation using texture featuresrdquo Skin Research and Tech-nology vol 9 no 4 pp 348ndash356 2003

[51] G M Weiss ldquoMining with rarity a unifying frameworkrdquo ACMSIGKDD Explorations Newsletter vol 6 no 1 pp 7ndash19 2004

[52] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

[53] H Liu and L Yu ldquoToward integrating feature selection algo-rithms for classification and clusteringrdquo IEEE Transactions onKnowledge and Data Engineering vol 17 no 4 pp 491ndash5022005

[54] I T Jolliffe Principal Component Analysis Springer New YorkNY USA 1986

[55] I Kononenko and E Simec ldquoInduction of decision treesusing RELIEFFrdquo in Proceedings of the ISSEK94 Workshop onMathematical and Statistical Methods in Artificial Intelligencevol 363 of International Centre forMechanical Sciences pp 199ndash220 Springer Vienna Austria 1995

[56] R Battiti ldquoUsing mutual information for selecting features insupervised neural net learningrdquo IEEE Transactions on NeuralNetworks vol 5 no 4 pp 537ndash550 1994

[57] M A Hall ldquoCorrelation-based feature selection for discreteand numeric class machine learningrdquo in Proceedings of the 17thInternational Conference on Machine Learning (ICML rsquo00) pp359ndash366 Morgan Kaufmann 2000

[58] E E Ghiselli Theory of Psychological Measurement McGraw-Hill New York NY USA 1964

[59] I H Witten and E Frank Data Mining Practical MachineLearning Tools and Techniques Morgan Kaufmann San Fran-cisco Calif USA 2005

[60] S Dreiseitl L Ohno-Machado H Kittler S Vinterbo HBillhardt and M Binder ldquoA comparison of machine learningmethods for the diagnosis of pigmented skin lesionsrdquo Journal ofBiomedical Informatics vol 34 no 1 pp 28ndash36 2001

[61] A Glowacz A Glowacz and P Korohoda ldquoRecognition ofmonochrome thermal images of synchronous motor with theapplication of binarization and nearestmean classifierrdquoArchivesof Metallurgy and Materials vol 59 no 1 pp 31ndash34 2014

[62] R Tadeusiewicz How Intelligent Should be System for ImageAnalysisvol 339 of Studies in Computational IntelligenceSpringer Berlin Germany 2011

[63] L Jost ldquoEntropy and diversityrdquo Oikos vol 113 no 2 pp 363ndash375 2006

[64] K-B Duan and S S Keerthi ldquoWhich is the bestmulticlass SVMmethod An empirical studyrdquo inMultiple Classifier Systems 6thInternational Workshop MCS 2005 Seaside CA USA June 13ndash15 2005 Proceedings vol 3541 of Lecture Notes in ComputerScience pp 278ndash285 Springer Berlin Germany 2005

[65] C Hsu and C Lin ldquoA comparison of methods for multi-class support vector machinesrdquo IEEE Transactions on NeuralNetworks vol 13 no 2 pp 415ndash425 2002

[66] A A Safi Towards computer-aided diagnosis of pigmented skinlesionson [PhD thesis] Technische Universitat Munchen 2012

[67] C-W Hsu C-C Chang and C-J Lin ldquoA practical guide tosupport vector classificationrdquo Tech Rep National Taiwan Uni-versity 2003 httpwwwcsientuedutwsimcjlinpapersguideguidepdf

[68] A M Molinaro R Simon and R M Pfeiffer ldquoPrediction errorestimation a comparison of resamplingmethodsrdquo Bioinformat-ics vol 21 no 15 pp 3301ndash3307 2005

[69] R Bouckaert ldquoChoosing between two learning algorithmsbased on calibrated testsrdquo in Proceedings of the 20th Interna-tional Conference on Machine Learning (ICML rsquo03) T Fawcettand N Mishra Eds pp 51ndash58 AAAI Press Washington DCUSA 2003

[70] P Flach Machine Learning The Art and Science of AlgorithmsThat Make Sense of Data Cambridge University Press NewYork NY USA 2012

[71] P Rubegni G Cevenini M Burroni et al ldquoAutomated diagno-sis of pigmented skin lesionsrdquo International Journal of Cancervol 101 no 6 pp 576ndash580 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 17: Research Article Automatic Classification of Specific ...downloads.hindawi.com/journals/bmri/2016/8934242.pdfmost popular melanocytic nevi to the diagnostic categories including Clark

BioMed Research International 17

[47] Z She and P Fish ldquoAnalysis of skin line pattern for lesionclassificationrdquo Skin Research and Technology vol 9 no 1 pp73ndash80 2003

[48] R P Walvick K Patel S V Patwardhan and A P DhawanldquoClassification of melanoma using wavelet-transform-basedoptimal feature setrdquo in Proceedings of the Medical ImagingImage Processing vol 5370 of Proceedings of SPIE pp 944ndash951May 2004

[49] W V Stoecker C-S Chiang and R H Moss ldquoTexture in skinimages comparison of three methods to determine smooth-nessrdquo Computerized Medical Imaging and Graphics vol 16 no3 pp 179ndash190 1992

[50] S V Deshabhoina S E Umbaugh W V Stoecker R H Mossand S K Srinivasan ldquoMelanoma and seborrheic keratosisdifferentiation using texture featuresrdquo Skin Research and Tech-nology vol 9 no 4 pp 348ndash356 2003

[51] G M Weiss ldquoMining with rarity a unifying frameworkrdquo ACMSIGKDD Explorations Newsletter vol 6 no 1 pp 7ndash19 2004

[52] N V Chawla K W Bowyer L O Hall and W P KegelmeyerldquoSMOTE synthetic minority over-sampling techniquerdquo Journalof Artificial Intelligence Research vol 16 pp 321ndash357 2002

[53] H Liu and L Yu ldquoToward integrating feature selection algo-rithms for classification and clusteringrdquo IEEE Transactions onKnowledge and Data Engineering vol 17 no 4 pp 491ndash5022005

[54] I T Jolliffe Principal Component Analysis Springer New YorkNY USA 1986

[55] I Kononenko and E Simec ldquoInduction of decision treesusing RELIEFFrdquo in Proceedings of the ISSEK94 Workshop onMathematical and Statistical Methods in Artificial Intelligencevol 363 of International Centre forMechanical Sciences pp 199ndash220 Springer Vienna Austria 1995

[56] R Battiti ldquoUsing mutual information for selecting features insupervised neural net learningrdquo IEEE Transactions on NeuralNetworks vol 5 no 4 pp 537ndash550 1994

[57] M A Hall ldquoCorrelation-based feature selection for discreteand numeric class machine learningrdquo in Proceedings of the 17thInternational Conference on Machine Learning (ICML rsquo00) pp359ndash366 Morgan Kaufmann 2000

[58] E E Ghiselli Theory of Psychological Measurement McGraw-Hill New York NY USA 1964

[59] I H Witten and E Frank Data Mining Practical MachineLearning Tools and Techniques Morgan Kaufmann San Fran-cisco Calif USA 2005

[60] S Dreiseitl L Ohno-Machado H Kittler S Vinterbo HBillhardt and M Binder ldquoA comparison of machine learningmethods for the diagnosis of pigmented skin lesionsrdquo Journal ofBiomedical Informatics vol 34 no 1 pp 28ndash36 2001

[61] A Glowacz A Glowacz and P Korohoda ldquoRecognition ofmonochrome thermal images of synchronous motor with theapplication of binarization and nearestmean classifierrdquoArchivesof Metallurgy and Materials vol 59 no 1 pp 31ndash34 2014

[62] R Tadeusiewicz How Intelligent Should be System for ImageAnalysisvol 339 of Studies in Computational IntelligenceSpringer Berlin Germany 2011

[63] L Jost ldquoEntropy and diversityrdquo Oikos vol 113 no 2 pp 363ndash375 2006

[64] K-B Duan and S S Keerthi ldquoWhich is the bestmulticlass SVMmethod An empirical studyrdquo inMultiple Classifier Systems 6thInternational Workshop MCS 2005 Seaside CA USA June 13ndash15 2005 Proceedings vol 3541 of Lecture Notes in ComputerScience pp 278ndash285 Springer Berlin Germany 2005

[65] C Hsu and C Lin ldquoA comparison of methods for multi-class support vector machinesrdquo IEEE Transactions on NeuralNetworks vol 13 no 2 pp 415ndash425 2002

[66] A A Safi Towards computer-aided diagnosis of pigmented skinlesionson [PhD thesis] Technische Universitat Munchen 2012

[67] C-W Hsu C-C Chang and C-J Lin ldquoA practical guide tosupport vector classificationrdquo Tech Rep National Taiwan Uni-versity 2003 httpwwwcsientuedutwsimcjlinpapersguideguidepdf

[68] A M Molinaro R Simon and R M Pfeiffer ldquoPrediction errorestimation a comparison of resamplingmethodsrdquo Bioinformat-ics vol 21 no 15 pp 3301ndash3307 2005

[69] R Bouckaert ldquoChoosing between two learning algorithmsbased on calibrated testsrdquo in Proceedings of the 20th Interna-tional Conference on Machine Learning (ICML rsquo03) T Fawcettand N Mishra Eds pp 51ndash58 AAAI Press Washington DCUSA 2003

[70] P Flach Machine Learning The Art and Science of AlgorithmsThat Make Sense of Data Cambridge University Press NewYork NY USA 2012

[71] P Rubegni G Cevenini M Burroni et al ldquoAutomated diagno-sis of pigmented skin lesionsrdquo International Journal of Cancervol 101 no 6 pp 576ndash580 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 18: Research Article Automatic Classification of Specific ...downloads.hindawi.com/journals/bmri/2016/8934242.pdfmost popular melanocytic nevi to the diagnostic categories including Clark

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology