novel network architecture and learning algorithm for the classification of mass abnormalities in...

13
Novel network architecture and learning algorithm for the classification of mass abnormalities in digitized mammograms Brijesh Verma * School of Computing Sciences, Central Queensland University, Bruce Highway, North Rockhampton, Queensland 4702, Australia Received 26 April 2007; received in revised form 18 September 2007; accepted 27 September 2007 Artificial Intelligence in Medicine (2008) 42, 67—79 http://www.intl.elsevierhealth.com/journals/aiim KEYWORDS Learning algorithms; Neural networks; Digital mammography Summary Objective: The main objective of this paper is to present a novel learning algorithm for the classification of mass abnormalities in digitized mammograms. Methods and material: The proposed approach consists of new network architecture and a new learning algorithm. The original idea is based on the introduction of an additional neuron in the hidden layer for each output class. The additional neurons for benign and malignant classes help in improving memorization ability without destroy- ing the generalization ability of the network. The training is conducted by combining minimal distance-based similarity/random weights and direct calculation of output weights. Results: The proposed approach can memorize training patterns with 100% retrieval accuracy as well as achieve high generalization accuracy for patterns which it has never seen before. The grey-level and breast imaging reporting and data system- based features from digitized mammograms are extracted and used to train the network with the proposed architecture and learning algorithm. The best results achieved by using the proposed approach are 100% on training set and 94% on test set. Conclusion: The proposed approach produced very promising results. It has out- performed existing classification approaches in terms of classification accuracy, generalization and memorization abilities, number of iterations, and guaranteed training on a benchmark database. Crown Copyright # 2007 Published by Elsevier B.V. All rights reserved. * Tel.: +61 749309058; fax: +61 749309027. E-mail address: [email protected]. 0933-3657/$ — see front matter. Crown Copyright # 2007 Published by Elsevier B.V. All rights reserved. doi:10.1016/j.artmed.2007.09.003

Upload: brijesh-verma

Post on 25-Aug-2016

218 views

Category:

Documents


1 download

TRANSCRIPT

Novel network architecture and learningalgorithm for the classification of massabnormalities in digitized mammograms

Brijesh Verma *

Artificial Intelligence in Medicine (2008) 42, 67—79

http://www.intl.elsevierhealth.com/journals/aiim

School of Computing Sciences, Central Queensland University, Bruce Highway,North Rockhampton, Queensland 4702, Australia

Received 26 April 2007; received in revised form 18 September 2007; accepted 27 September 2007

KEYWORDSLearning algorithms;Neural networks;Digital mammography

Summary

Objective: The main objective of this paper is to present a novel learning algorithmfor the classification of mass abnormalities in digitized mammograms.Methods and material: The proposed approach consists of new network architectureand a new learning algorithm. The original idea is based on the introduction of anadditional neuron in the hidden layer for each output class. The additional neurons forbenign and malignant classes help in improving memorization ability without destroy-ing the generalization ability of the network. The training is conducted by combiningminimal distance-based similarity/random weights and direct calculation of outputweights.Results: The proposed approach can memorize training patterns with 100% retrievalaccuracy as well as achieve high generalization accuracy for patterns which it hasnever seen before. The grey-level and breast imaging reporting and data system-based features from digitized mammograms are extracted and used to train thenetwork with the proposed architecture and learning algorithm. The best resultsachieved by using the proposed approach are 100% on training set and 94% on test set.Conclusion: The proposed approach produced very promising results. It has out-performed existing classification approaches in terms of classification accuracy,generalization and memorization abilities, number of iterations, and guaranteedtraining on a benchmark database.Crown Copyright # 2007 Published by Elsevier B.V. All rights reserved.

* Tel.: +61 749309058; fax: +61 749309027.E-mail address: [email protected].

0933-3657/$ — see front matter. Crown Copyright # 2007 Published by Elsevier B.V. All rights reserved.doi:10.1016/j.artmed.2007.09.003

68 B. Verma

1. Introduction

Breast cancer continues to be the most commoncause of cancer deaths in women. Every year morethan 1 million women develop breast cancer world-wide [1,2]. A report by a National Cancer Institute[3] estimates that 1 in 11 women develop breastcancer in Australia [3], one in eight in US, one in ninein UK and Canada. In 2005, an estimated 1,150,000women worldwide were diagnosed with breast can-cer and 411,000 women died from the disease [1].

As there is currently no means of preventingbreast cancer, the focus in reducing deaths fromthe disease has been on finding breast cancer asearly as possible. Early detection of cancer savespatients from the more aggressive radical treat-ments and increases the overall survival rate. Theintroduction of screening mammography in 1963brought major revolution in breast cancer detectionand diagnosis. It is widely adopted in many countriesincluding Australia as a nation wide public healthcare program. The decline in the number of breastcancer deaths [4] corresponds directly to anincrease in routine mammography screening.

At present digital mammography is one of themost reliable procedures for early detection ofbreast cancer [1—3,5,6]. Digital mammographyscreening has advantages compared with film mam-mography, according to the results of a recent studyreported [6] in September 2005 in the New EnglandJournal of Medicine. Current image processing tech-niques for digitized mammography make the primi-tive breast abnormalities detection easier; howevertheir classification as malignant or benign remainsone of the most difficult tasks for radiologists andresearchers. The key reason is a lack of individualityin benign and malignant class patterns. In manycases, both classes exhibit similar characteristics,i.e. size, shape and distribution of microcalcifica-tion. Radiologists’ interpretation of such cases oftenproduce screening errors; either to miss malignantcases or more benign biopsies. Research has shownthat the computer-aided diagnosis (CAD) can sig-nificantly reduce misdiagnosis. In a 2005 study pub-lished in the European journal of radiology,researchers found that CAD was able to correctlypoint out cancers that junior radiologists tended tomiss more often [6]. When applied to the UnitedStates population, researchers have estimated thatfor every 100,000 cancers currently detected withscreening mammograms (in women with no obvioussigns of breast cancer, such as a lump), the use ofCAD technology could result in the detection of anadditional 20,500 breast cancers.The ability ofneural classifiers to learn from the attributes ofgiven class patterns and to classify the unknown

patterns of given classes into appropriate classesusing the acquired knowledge has shown its poten-tial [7—22] in the field of mammography. Advance-ment in computational intelligence and patternrecognition techniques [7—44] has improved theperformance of computer-aided classification ofthe breast abnormalities into malignant and benignpatterns in digitized mammograms.

In this paper, we propose a neural architectureand a training algorithm which can learn and classifymass abnormalities in digitized mammograms. Theapproach is based on introduction of additionalneurons in hidden layer for benign and malignantclasses and a weight adjustment technique for thecalculation of weights.

The remainder of this paper is broken down intofive sections. Section 2 reviews existing techniquesfor the classification of benign and malignant pat-terns. Section 3 discusses the proposed technique.Section 4 presents the experimental results obtai-ned using the proposed architecture and learningalgorithm. Section 5 presents a discussion and ana-lysis of the obtained results. Finally, Section 6 con-cludes the paper and describes the future researchdirections.

2. Review of existing techniques

The detection and classification of breast abnorm-alities in digitized mammograms has been studiedfor the last few decades and many scientific papershave been published in the literature. Recent sur-veys by Cheng et al. [7] and Verma and Panchal [15]present a comparative analysis of various algorithmsand techniques for the diagnosis of abnormalities indigitized mammograms. The techniques such asartificial neural networks [7—20], fuzzy logic[8,23,24,37], and wavelet transforms [25,26] arethe most commonly used for the classification ofmalignant and benign patterns in digitized mammo-grams. Verma and Zakos [8] used a back-propagationneural network for the classification of the suspi-cious lesions extracted using a fuzzy rule-baseddetection system. They obtained 88.9% classifica-tion rate using manual combination of features.Zhang et al. [9] used a genetic algorithm for neuralnetwork learning in their study of breast abnorm-ality classification. Two types of features; greylevel-based statistical features and radiologist’sinterpretation features including patient age areextracted from the DDSM database to test theirproposed technique. They have attained a good(90.5%) accuracy rate on test set at the cost oflow accuracy rate on training set. Chitre et al.[12] compared the artificial neural networks and

Network architecture and learning algorithm 69

Figure 1 An overview of the proposed approach.

Figure 2 Proposed network architecture.

the statistical methods for microcalcification pat-terns classification. They obtained a classificationrate of 60%, which was better than the statisticalclassifiers. A comparative study of a radial basisfunction (RBF) and a multilayer perceptron (MLP)-based neural networks for the classification ofbreast abnormalities using the texture featureswere performed by Christoyianni et al. [13] andBovis et al. [14] in their research work. They con-cluded that MLP obtained 4% higher accuracy thanRBF. Wroblewska et al. [27] proposed a new seg-mentation and feature extraction technique forreliable classification of microcalcifications whichachieved low classification rate (78%) on DDSM data-base. Yu and Guan [31] used a multilayer feedforward neural network to classify the potentialpixel as a true or false microcalcification object.They obtained good true positive accuracy at thecost of very low false positive rate.

As can be seen in previous paragraphs and litera-ture [1—44] that there has been a lot of research inlast few decades and neural network based techni-ques have shown promising future in breast cancerdiagnosis. There is no doubt that neural networkshave achieved better performance [7,8,15,43] interms of classification accuracy than other tradi-tional techniques. Neural networks have improvedaccuracy rate for the classification of benign andmalignant patterns in digitized mammography.Recent reviews [7,15] have also reported the super-iority of neural networks over other techniques.However, as mentioned earlier there are manydrawbacks with neural network-based techniquesfor the diagnosis of breast cancer. First such draw-back is that current neural network based techni-ques do not provide any assurance to radiologiststhat they are not going to misclassify patterns whichare already in classified database (known patterns).Second drawback is that the current neural networkapproaches are very slow in adapting and learningnew knowledge from new acquired mammograms.Third drawback is that its generalization ability andconsistency, if we try to get good accuracy onunseen mammograms (test data), the accuracy rateon seen mammograms (training data) may drop and

vice versa. Fourth and final drawback is that thereare too many parameters such as learning rate,momentum, hidden units, hidden layer which shouldbe adjusted/optimised during the training. Theresearch work presented in this paper focuses onsolving some of the major drawbacks by developingnew architecture and a learning algorithm.

3. Research methodology

The research methodology consists of four parts asfollows: (1) acquisition of digitized mammogramsfrom a benchmark database, (2) extraction of fea-tures, (3) selection of features (optional), and (4)classification of features (classifier). An overview ofthe researchmethodology is presented in Figs. 1 and2 and details are described in the following sections.

3.1. Benchmark database of digitizedmammograms

The digitized mammograms from university of SouthFlorida’s digitized database for screening mammo-

70 B. Verma

graphy (DDSM) are used in this research. The reasonfor using this database is to compare the results ofour study with other National and InternationalResearchers. DDSM is a benchmark database andwidely used by researchers to carry out and evaluatetheir research work with other researchers in thearea of CAD of breast cancer [32]. We have used thisdatabase in our previous research so we havealready developed software to decode the databaseand extract features. The database containsapproximately 2500 studies of malignant, benign,benign-without-callback and normal cases. Digi-tized mammograms of DDSM have already beeninterpreted by expert radiologists and the appro-priate information has been provided in ‘‘ics’’ and‘‘overlay’’ files. In this research 100 malignant and100 benign cases are used for training and testing.The 100 training and 100 test sets were taken asrecommended by DDSM benchmark so that theresults can be compared with other existing andfuture research. How training and testing sets werecreated by DDSM is described below. The descriptionpresented below is taken from DDSM’s website.

Sampling a set of cases from the DDSM databaseto use for evaluating a spiculated mass detectionalgorithm required making some choices. A set ofcases from the DDSM that had at least one, malig-nant, spiculated mass in it were selected. For sim-plicity, a set of cases that were all scanned on thesame scanner and that all had the ground truthmarked by the same radiologist were selected.The resulting set of cases were split into a trainingset and a test set using while attempting to balancethe lesion subtlety and ACR breast density in the twodatasets. The resulting list of cases can be found athttp://marathon.csee.usf.edu/Mammography/Database.html. Each case contains four mammo-grams from a screening exam. The images werescanned on a HOWTEK 960 digitizer with a samplerate of 43.5 mm at 12 bits per pixel. The imageswere preprocessed to crop out much of the imagethat did not contain imaged breast tissue and todarken regions of the image that contained patientinformation or technician identifiers by setting pix-els in those regions to the value zero. Each imagewas then compressed using a truly lossless compres-sion algorithm.

3.2. Feature extraction

Once all suspicious areas are extracted from digi-tized mammograms, for each suspicious area ofdigitized mammogram, grey level features are thenextracted employing statistical formulas on greylevel values of suspicious area and, BI-RADS lesiondescriptor features, patient age feature and the

subtlety value feature are extracted from associatecase information files. The selected features repre-sent the textural properties and morphologicalstructure of an abnormality. These measures arevery important considerations for effective classifi-cation of breast abnormality patterns. Extractedvalues of each feature are normalised to be consis-tent between 0 and 1.

3.2.1. Grey level-based featuresGrey level-based features are calculated using sta-tistical formulas on the grey level pixel values ofeach suspicious area and respective surroundingboundary area. A total of 14 features are consideredas grey level features in this study. All 14 features/formulas are briefly described below. Refer to [8] fordetail description.

Number of pixels: it is a count of the pixels in theextracted suspicious area.

Average boundary grey: it is the average greylevel value around the outside of the suspiciousarea:

average histogram ¼ 1

k

Xk�1j¼0

Nð jÞT

(1)

1XT�1

average grey ¼

Tg¼0

IðgÞ (2)

difference ¼ average grey

� average boundary grey (3)

Difference

contrast ¼

AvgGreyþ AvgBoundryGrey(4)

Xk�1

energy ¼

j¼0½Pð jÞ�2 (5)

XT�1

modified energy ¼

g¼0½PðIðgÞÞ�2 (6)

Xk�1

entropy ¼ �

j¼0Pð jÞ log2½Pð jÞ� (7)

XT�1

modified entropy ¼ �

g¼0PðgÞ log2½PðIðgÞÞ� (8)

standard deviation ðsÞ

¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXT�1g¼0ð j� AvgGreyÞ2

vuut Pð jÞ (9)

modified standard deviation ðsÞ

¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXT�1g¼0ðIðgÞ � AvgGreyÞ2

vuut PðIðgÞÞ (10)

Network architecture and learning algorithm 71

1 Xk�1

skew ¼

s3j j¼0ð j� AvgGreyÞ3Pð jÞ (11)

1 XT�1

modified skew ¼

s3g g¼0

ðIðgÞ � AvgGreyÞ3PðIðgÞÞ

(12)

where T is the total number of pixels, g the indexvalue of image I, K the total number of grey levels(i.e. 4096), j the grey level value (i.e. 0—4096), I(g)the grey level value of pixel g in image I, N(j) thenumber of pixels with grey level j in image I, P(I(g))the probability of grey level value I(g) occurring inimage I, P(g) = N(I(g))/T and P(j) the probability ofgrey level value j occurring in image I, P(j) = N(j)/T.

3.2.2. BI-RADS lesion descriptor featuresEach case in the DDSM contains information such asbreast density, an abnormality description andabnormality assessment rank that were specifiedby an expert mammography radiologist using theBI-RADS lexicon [32]. This information is stored in‘.ics’ file and ‘.OVERLAY’ file of each case. Thedensity of breast tissues and abnormality shape/type and its distribution/outline/margin insidebreast tissues are the key factors radiologists con-sider when judging the likelihood of cancer beingpresent. Abnormality assessment rank suggests theseverity of abnormality. Considering their impor-tance in the human interpretation process BI-RADSfeatures have been used to evaluate their signifi-cance in computer-aided classification of breastabnormalities in digital mammograms.

The morphological descriptors for mass lesionsare mass shape and mass margin. So the total fourBI-RADS lesion descriptor features are density, massshape, mass margin, and abnormality assessmentrank.

3.2.2.1. Breast tissue density. This feature givesinformation about the density of breast tissues ratedin 1—4 according to BI-RADS standards by an expertradiologist [32].

3.2.2.2. Morphological description of breast ab-normalities. Radiologists consider the arrangementand structural formation of abnormalities within thebreast tissue to distinguish the abnormality intocancerous (malignant) and non-cancerous (benign)classes. Thus, morphological descriptions of breastabnormality are considered as features. Like otherfeatures, morphological descriptions of breastabnormality are encoded into numeric values toget real feature values. The encoded numeric fea-ture values for mass shapes (round, oval, lobulated,

irregular, architectural_distortion, tubular, lymphnode, asymmetric_breast_tissue, focal_asymme-tric_density) and margins (circumscribed, microlo-bulated, obscured, ill_defined, spiculated) featuresfound in DDSM cases [32] are 1—9 and 1—5, respec-tively.

3.2.2.3. Abnormality assessment rank. It cate-gorises the seriousness of the abnormality foundand suggests the course of the action should betaken accordingly. It is rated in 1—5 by an expertradiologist [32].

3.2.2.4. Patient age feature. The DDSM also pro-vides information such as the patient age at the timethemammogramwas taken for each case study [32].Various medical findings and past breast cancerstatistics show that the risk of breast cancerincreases with age. This makes patient age a veryuseful feature for medical practitioners to considerin diagnosis of breast cancer. However, the currentbreast cancer statistics reveal many exceptions tothis correlation. In recognition of the correlationand exceptions, patient age feature is used in theproposed research work for neural classification ofbreast abnormalities.

3.2.2.5. Subtlety value feature. This is a subjec-tive impression of the subtlety of a lesion by anexpert radiologist. It is rated in 1—5 by an expertradiologist where 1 is ‘‘subtle’’ and 5 is ‘‘obvious’’. Ahigher subtlety rating indicates a more obviouslesion. The subtlety value for a lesion may indicatehow difficult it is to find [32]. Subtlety of lesion playsan important role in their interpretation as subtlemammographic lesions are associated with signifi-cantly different search parameters than obviouslesions. Cancerous abnormalities are more subtlewith ill-defined edges compared with non-cancerousabnormalities. This is one of the key factors whichaffect the overall interpretation accuracy.

3.3. Classifier

As mentioned in Section 2, many classifiers includ-ing neural classifiers and traditional classifiers [7—44] have been used to classify suspicious areas indigitized mammograms into benign and malignantpatterns. It has been shown [7,15,43] that neuralclassifiers outperformed traditional classifiers.However, there are many problems (1) currentclassifiers can misclassify even known cancer cases(e.g. 80% accuracy on training set means that 20%known cases will be misclassified), (2) difficult totrain (may take many days to learn), (3) do notprovide consistency (accuracy rates are different

72 B. Verma

every time you train on same or on different data-base), and (4) no guarantee for solution (sometimethere are no results after a few days of training). Inthis study, novel network architecture and a learn-ing algorithm to overcome these problems areintroduced and investigated. An overview of thenetwork architecture is presented in Fig. 2. As seenin Fig. 2, two additional neurons for benign andmalignant patterns are introduced which are con-nected to the output of the network. The main ideabehind this is to improve the memorization/asso-ciation abilities of the network without destroying/degrading generalization capabilities. It is wellknown fact in traditional classifier learning thatwe aim to achieve a high recognition/classificationrate on test data, but we forget that the classifica-tion on training datamay suffer. With current learn-ing algorithms, it is impossible to get 100%classification rate on training data and a high clas-sification rate on test data which means that thenetwork may incorrectly classify already knowncases. In situation such as diagnosis of breast can-cer, radiologists or doctors are most unlikely to usethe network which misclassifies obvious cases (truecases) from the database. The new network with100% memorization capabilities does not misclas-sify any benign or malignant pattern from trainingdata or very close to training data. The weights(input-hidden-output) of additional neurons aretrained differently (refer to learning algorithm)than the rest of the network.

A new learning algorithm for training of thenetwork in Fig. 2 is investigated in this study.The main idea behind the new learning algorithmis that it is possible to minimize output error basedon calculation of weights between hidden layerand output layer using least square methods. Thenonlinear sigmoidal function at the output can beconverted into a linear function by using a logfunction. A linear system can be established foreach output class and the output weights can becalculated by solving the linear system. Theweights between input and the hidden layersare randomly selected. By doing this we areremoving traditional gradient-based neural net-works’ problems such as local minima, paralysis,long training time, uncertainty, etc. We will alsohave a 100% guarantee that there will always be asolution [34]. There will be no such situation thatafter many hours we do not have any solution. Theuse of this idea also makes the training very fastwhich solves the problem of slow adaptation ofnew knowledge from new mammograms. Theweights for two additional hidden nodes/neurons(black top and bottom circles in Fig. 2) are trainedseparately using the following idea.

3.3.1. Minimal distance-based calculation ofweights for additional neuronsThe basic idea is to find a distance between twopatterns which is minimal. This principle has beenused in many well-known statistical and intelligenttechniques [43] such as nearest neighbour, ham-ming network and distance based clustering, tofind the similarity in patterns. The idea here in theproposed approach is to maximize the output valueof the class which the input pattern belongs to.The process is independent to the rest of thetraining and the output neuron for only input classis fired (e.g. if input test pattern is from benigndatabase then the output value for benign classwill be higher than the output value for malignantclass). The weights between input and benignneuron are set using the benign training patternsand the weights between input and malignantneuron are set using malignant training patterns.The following function is formulated and investi-gated in this study:

WcMal ¼ logtarget

1� target

� �

� e�minðjjx�XallMalPatternsjjÞ�logðtarget=ð1�targetÞÞ

(13)

where target is the desired output for malignantclass, x is an input test pattern and XallMalPatternsis a set of all malignant patterns from training set.

The function maximizes the output value for thetest input which has exact match in training data.The value decreases according to closenessbetween the test input and the training data. Theaim of the min function is to find a minimal valuebetween the input and X (weights). The outputfrom additional neuron is passed to the followingactivation function:

fðoutÞ ¼out if out� log

target

1� target

� �

out� gen if out< logtarget

1� target

� �8>>><>>>:

(14)

where gen should be less than 0.5 for 100% classi-fication on training data. The output value is addedto network’s output before activation function. Theaim of the above activation function is to producethe maximum value for the output of the networkonly if the input pattern is from training data orsimilar to training data.

The following steps are used to train the classifiernetwork.

Step 1: S

et network’s initial parameters:n–—inputs (number of features);

Network architecture and learning algorithm 73

net ¼ lo

fðnetÞ ¼

h–—hidden units (number of hidden units,for example, start with h = 2);m–—outputs (number of classes, forexample m = 2 as we have malignantand benign);p–—training pairs (total number of benignand malignant samples for training).

g

8>><>>:

Step 2: S

et the weights between inputs and hiddenunits.

The weights between inputs and hiddenunits are set using the following two steps.

Step 2.1: Set weights (Wih) betweeninputs (Xn) and hidden units (Hh).The weights are set to small random

values. Use rand function in C++.Step 2.2: Set weights between inputs(Xn) and two additional hidden neu-rons/units (Hben, Hmal)The weights for additional hidden neu-

rons are calculated using minimal distanceapproach as described above in Section C3.Each weight is a vector and the length ofvector is equal to p (e.g. Wiben[n][p] = X (allbenign patterns)).

target

1� target

�� e�minðjjx�XallMalPatternsjjÞ�logðtarget=ð1�targetÞÞ (18)

net if net� logtarget

1� target

� �

net� 0:5 if net< logtarget

1� target

� � (19)

Step 3: S

et the weights between hidden units andoutputs.

The weights (Who, Wbeno, Wmalo) bet-ween hidden units and outputs are setby calculating them as shown in stepsbelow.

Step 3.1: Feed each input pattern to thenetwork and calculate output of the hid-den layer (Hh).Step 3.2: Calculate the output value(Obaf) before the activation function asfollows.

Obaf ¼ logtarget

1� target

� �(15)

Step 3.3: Set a linear system of equationsby using Hh, Obaf and Who

HhWho ¼ Obaf � use least squaremethods

ðe:g:Gram�SchmidtÞto calculateWho

Step 3.4: Repeat Steps 3.2 and 3.3 foreach output.Step 3.5: Set weights Wbeno, Wmalo to +1and �1 as shown in Fig. 2.

Step 4: I

ncrement #HU (hidden unit), repeat Steps2—3, stop when the network starts memor-izing instead of generalizing (find a numberin termsof trainingpatterns (e.g.p/4,p/2)).

Step 5: S

elect #HU (hidden unit) with the bestaccuracy results on test set of training data.

The following steps are used to test the network.

Step 1: F

eed test input (X) and output to the net-work.

Step 2: C

alculate the output of the hidden layer.The output (except benign and malig-nant neurons) is calculated as follows:

net ¼ sumðXWÞ (16)

output ¼ fðnetÞ ¼ 1

1þ e�net(17)

The output for benign and malignantneurons is calculated as follows:

Step 3: C

alculate the output of the network.The output of the network is calculatedas follows:

net ¼ sumðHWÞ (20)

output ¼ fðnetÞ ¼ 1(21)

1þ e�net

eoretical underpinning

3.4. Th

Let X (x1, x2, . . ., xn) be the input, Y (y1, y2, . . ., yn)be the output, W be the weight matrix of hiddenlayer and W0 the weight matrix of the output layer.

Let x be the training pattern from X (x 2 X), M bethe set of all malignant patterns and B be the set ofall benign patterns. The output of the proposednetwork A and A0 can be calculated as follows:

A�ðxWÞW 0 þ logtarget

1� target

� �

� e�minðjjx�MjjÞ�logðtarget=ð1�targetÞÞ (22)

74 B. Verma

A0 � ðxWÞW 0

þ logtarget

1� target

� �e�minðjjx�BjjÞ�logðtarget=ð1�targetÞÞ

(23)

To prove that the proposed approach will alwayswork, it is shown below that A > A0 for all x 2 M andA0 > A for all x 2 B (Flowcharts for training andtesting processes are shown in Fig. 3).

For all x 2 M:

e�minðjjx�MjjÞ�logðtarget=ð1�targetÞÞ

> e�minðjjx�BjjÞ�logðtarget=ð1�targetÞÞ (24)

target� �

) log1� target

�e�minðjjx�MjjÞ�logðtarget=ð1�targetÞÞ> logtarget

1� target

� �

�e�minðjjx�BjjÞ�logðtarget=ð1�targetÞÞ (25)

¼ >A>A0

For all x 2 B:

e�minðjjx�BjjÞ�logðtarget=ð1�targetÞÞ

> e�minðjjx�MjjÞ�logðtarget=ð1�targetÞÞ (26)

Figure 3 Flow-chart for tra

target� �

�minðjjx�BjjÞ�logðtarget=ð1�targetÞÞ

) log1� target

e

> logtarget

1� target

� �e�minðjjx�MjjÞ�logðtarget=ð1�targetÞÞ

(27)

)A0>A

4. Implementation and experimentalresults

The proposed approach has been implemented inC++ onWindows XP platform and UNIX platform. Theexperiments included in this paper were conductedon Windows XP (Pentium 4 machine) using theapproach presented in this paper and otherapproaches such as MLP-BP (MLP trained usingback-propagation) MLP-GA (MLP trained using GA),DA (discriminatory analysis) and LR (logistic regres-sion). The experiments were conducted using thesame database and computing environment. Table 1presents results with 6 BI-RADS features (radiologistinterpretation) and varying parameters such as

ining and test processes.

Network architecture and learning algorithm 75

Table 1 Results using 4 BI-RADS features, patient age feature and subtlety value feature

Performance

# of hidden units # of iterations Classification rates

Training set Test set

Proposed approach 10 1 100 8410 947 100 9320 1 100 8420 1,560 100 9432 1 100 9032 1,190 100 94

MLP-BP 10 70,000 99 9110 80,000 100 8320 10,000 95 9120 60,000 100 88

MLP-GA 18 NA 96 87DA NA NA 85 88LR NA NA 93 90

iterations and hidden units. The best results arerecorded here. Table 2 presents results with 14 greylevel features and all 6 BI-RADS features. Table 3presents the results with 14 grey level features andonly 4 BI-RADS features.

5. Discussion and analysis of results

The results using the proposed approach on threetypes of feature combinations have been presentedin Tables 1—3. The ROC curves using results fromexperiments have been displayed in Figs. 4—6. Theproposed approach obtained highest classificationrate (94%) for all three-feature combinations. Itachieved the highest classification rate with just a

Table 2 Results using 14 grey level-based features, 4 BI-feature

Performance

# of hidden units #

Proposed approach 15 115 116 116 120 120 2

MLP-BP 16 316 420 1

MLP-GA 18 NDA NA NLR NA N

few iterations. It is good to notice that classificationrate on training set is always 100% which means thatthe proposed approach has a good memorizing abil-ity with an excellent generalization ability. Trainingin just a few iterations means that the new data canbe adapted just in few minutes.

The obtained results with highest classificationrates were analyzed for finding benign and malig-nant class errors, classes with consistent errors andimpact of hidden units. As listed in Tables 1—3, thebest results produced 6% misdiagnosed regardless offeature combinations used. Table 4 presents theanalysis of results with two different hidden units,same input features and the same classificationrates. In both cases (20 and 32 hidden units), themalignant class has only one misclassification error

RADS features, patient age feature and subtlety value

of iterations Classification rates (%)

Training set Test set

100 90,937 100 94

100 88,043 100 93

100 85,010 100 91

0,000 100 840,000 100 900,000 100 91

A 81 89A 90 88A 97 89

76 B. Verma

Table 3 Results using 14 grey level-based features and 4 BI-RADS features

Performance

# of hidden units # of iterations Classification rates (%)

Training set Test set

Proposed approach 10 1 100 7510 2,649 100 9412 1 100 9112 1,808 100 9414 1,828 100 93

MLP-BP 10 30,000 100 9014 20,000 99 91

MLP-GA 18 NA 81 89DA NA NA 90 88LR NA NA 97 89

Figure 4 ROC curve for results in Table 1.

Figure 6 ROC curve and area under curve obtained fordifferent features with the same 94% accuracy.

and the benign class has fivemisclassification errors.However, all misclassified classes were not thesame. The class pattern numbers (mammograms)14, 38 and 94 were consistent and misclassified inboth cases.

Table 5 presents the analysis of results with twodifferent hidden units, different classification ratesand same features (14 grey level-based features, 4BI-RADS features, patient age feature and subtlety

Figure 5 ROC curve for results in Table 2.

value feature). It is interesting to see that thechange of features resulted in more misclassifica-tions in malignant classes. The misclassificationerrors for benign and malignant are same (3 eachfor 15 hidden units). The consistency of misclassifiedpatterns is better as five misclassified classes (14,38, 45, 75, 81) were same in both cases (hidden units15 and 16). The grey-level based features providedmore misclassification malignant cases than BI-RADSfeatures alone.

The proposed network can automatically find thenumber of hidden units required for the best solu-tion (highest classification rate on test data). It givesyou the best number of hidden units in the first run.It was found that the number of hidden units withhighest classification rates in the first run canachieve the best solution. This was true for allthe experiments.

The results obtained by theproposed approach arebetter than MLP-BP, MLP-GA, DA and LR. The resultsare also much better than the most of the recentlypublishedresults in the literature. Sometime it is very

Network architecture and learning algorithm 77

Table 4 Different hidden units, same features (bi-rads, patient age and subtlety value features) and sameclassification rate

Case # Correct outputs Network outputs Network class

Hidden units: 20; classification rates: 94% (benign errorsa: 5; malignant errorsb: 1; sensitivity: 98%; specificity: 90%)14 0.1 0.9 (benign) 0.782 0.218 Malignant38 0.1 0.9 (benign) 0.912 0.088 Malignant64 0.1 0.9 (benign) 0.552 0.448 Malignant81 0.9 0.1 (malignant) 0.462 0.538 Benign92 0.1 0.9 (benign) 0.634 0.366 Malignant94 0.1 0.9 (benign) 0.964 0.036 Malignant

Hidden units: 32; classification rates: 94% (benign errorsa: 5; malignant errorsb: 1; sensitivity: 98%; specificity: 90%)14 0.1 0.9 (benign) 0.720 0.280 Malignant38 0.1 0.9 (benign) 0.931 0.069 Malignant64 0.1 0.9 (benign) 0.327 0.673 Benign81 0.9 0.1 (malignant) 0.504 0.496 Malignant92 0.1 0.9 (benign) 0.403 0.597 Benign94 0.1 0.9 (benign) 0.997 0.003 Malignant50 0.1 0.9 (benign) 0.529 0.471 Malignant62 0.1 0.9 (benign) 0.806 0.194 Malignant45 0.9 0.1 (malignant) 0.391 0.609 Benign

Patterns 14, 38, 94 are misclassified in both cases.a Benign classified as malignant.b Malignant classified as benign.

difficult to compare results from various techniquesbecause researchers use different database.We haveselected some recent papers with results using DDSMdatabase for comparison purposes. Zhang et al. [9]used same DDSM benchmark database and reportedthe highest 90.5% classification rate for the calcifica-tion cases and 87.2% classification rate for the masscases. The highest results obtained (94%) in thisresearchare formass cases sonearly 7% improvementhas been achieved. Wroblewska et al. [27] have used

Table 5 Different hidden units, same features and differe

Case # Correct outputs

Hidden units: 15; classification rates: 94% (benign errorsa: 32 0.1 0.9 (benign)14 0.1 0.9 (benign)38 0.1 0.9 (benign)45 0.9 0.1 (malignant)75 0.9 0.1 (malignant)81 0.9 0.1 (malignant)

Hidden units: 16; classification rates: 93% (benign errorsa: 314 0.1 0.9 (benign)38 0.1 0.9 (benign)45 0.9 0.1 (malignant)75 0.9 0.1 (malignant)81 0.9 0.1 (malignant)63 0.9 0.1 (malignant)94 0.1 0.9 (benign)

Patterns 14, 38, 45, 75, 81 are misclassified in both cases.a Benign classified as malignant.b Malignant classified as benign.

the same DDSM benchmark database and reported76% classification ratewhich ismuch lower (18%) thanthe obtained classification rate in this research.Masotti [40] have used DDSM benchmark databaseand support vector machine (SVM), they achieved90% accuracy which is 4% less than obtained accuracyin this research. Figs. 4—6andTables 6 and7 show theROC curves and areas under the curves for variousparameters such as hidden units, classification accu-racy, etc. Overall, the proposed technique has out-

nt classification rates

Network outputs Network class

; malignant errorsb: 3; sensitivity: 94%; specificity: 94%)0.503 0.497 Malignant0.709 0.291 Malignant0.834 0.166 Malignant0.338 0.662 Benign0.318 0.682 Benign0.268 0.732 Benign

; malignant errorsb: 4; sensitivity: 92%; specificity: 94%)0.789 0.211 Malignant0.885 0.115 Malignant0.343 0.657 Benign0.240 0.760 Benign0.351 0.649 Benign0.414 0.586 Benign0.820 0.180 Malignant

78 B. Verma

Table 6 Area under the curve

Area S.E.

Table 1 (proposed approach, #HU = 20, accuracy = 94%) 0.957 0.024Table 1 (proposed approach, #HU = 32, accuracy = 90%) 0.943 0.027

Table 7 Area under the curve

Area S.E.

Table 2 (proposed approach, #HU = 15, accuracy = 94%) 0.964 0.017Table 2 (proposed approach, #HU = 16, accuracy = 93%) 0.947 0.022

performed all the existing neural network based andstatistical techniques.

6. Conclusions

We have presented a novel neural architecture and alearning algorithm for the classification of massabnormalities in digitized mammograms. The pro-posed methodology has been implemented and var-ious experiments on a digital mammographybenchmark database have been conducted. Theexperiments using the proposed approach on abenchmark database produced 94% classificationaccuracy on test set and 100% classification accuracyon training set. The results are very promising. Theproposed approach has significantly improved theresults in terms of classification rates, number ofiterations, memorization, generalization and fastguaranteed training.

There are four major advantages of the proposedapproach over existing approaches for breast cancerdiagnosis. First advantage is that the medical com-munity in particular radiologists can have 100%assurance that the system based on the proposedarchitecture and learning algorithm is not going tomisclassify patterns from already classified/diag-nosed database. Second advantage is that thenew approach can adapt and learn new knowledgefrom new training patterns very quickly. Thirdadvantage is that it can generalize much betterthan multilayer perceptron with standard backpro-pagation algorithm. Fourth and final advantage isthat there are not many parameters such as learningrate, momentum, etc. which should be adjustedduring the training.

The various sets of statistical and breast imagingreporting and data system based features wereinvestigated. The different features provided dif-ferent classification errors. A feature selection pro-cess can be implemented and used which mayfurther improve the classification accuracy.

In our future research, we would like to conductmore experiments on DDSM and other benchmarkdatabases. The various new activation functions foradditional and output neurons will be investigated.We would like to further test the automatic findingof hidden units for large number of training data. Afeature selection process will also be implementedand incorporated with the proposed approach.

Acknowledgments

Author would like to thank Rinku Panchal, a PhDstudent at Central Queensland University and PingZhang, a PhD student at Bond University for helpingwith MLP-BP, MLP-GA, DA, and LA experiments.

References

[1] Parkin M, Freddie Bray F, Ferlay J, Pisani P. Global CancerStatistics, A Cancer Journal for Clinicians, vol. 55, AmericanCancer Society; 2005. p. 74—108.

[2] Breast Cancer Facts; 2005. www.breastcancerfund.org(accessed September 12, 2007).

[3] National Breast Cancer Centre, http://www.nbcc.org.au/bestpractice/statistics/ (accessed September 12, 2007).

[4] American College of Radiology. www.acr.org (accessed Sep-tember 12, 2007).

[5] NSW Breast Cancer Institute, http://www.bci.org.au(Accessed September 12, 2007).

[6] Pisano ED, Gatsonis C, Hendrick E, Yaffe M, Baum K, Achar-yya S, et al. Diagnostic performance of digital versus filmmammography for breast cancer screening. N Engl J Med2005;353:1773—83.

[7] Cheng HD, Cai X, Chen X, Hu L, Lou X. Computer-aideddetection and classification of microcalcification in mam-mograms: a survey. Pattern Recogn 2003;36:2967—91.

[8] Verma B, Zakos J. A computer-aided diagnosis system fordigital mammograms based on fuzzy-neural and featureextraction techniques. IEEE Trans Inform Technol Biomed2001;5:46—54.

[9] Zhang P, Verma B, Kumar K. A neural-genetic algorithm forfeature selection and breast abnormality classification indigital mammography. In: Proceedings of IEEE international

Network architecture and learning algorithm 79

joint conference on neural networks, IEEE, vol. 3; 2004. p.2303—9.

[10] Zhang P, Verma B, Kumar K. Neural Vs. Statistical classifierin conjunction with genetic algorithm feature selection indigital mammography. Pattern Recogn Lett 2005;26(7):909—19.

[11] Wu Y, He J, Man Y, Arribas JI. Neural network fusion stra-tegies for identifying breast masses. In: Proceedings of IEEEinternational joint conference on neural networks, IEEE,vol. 3; 2004. p. 2437—42.

[12] Chitre Y, Dhawan AP, Moskowitz M. Artificial neural networkbased classification of mammographic microcalcificationsusing image structure features. In: Bowyer KW, Astley S,editors. State of the art in digital mammographic imageanalysis, vol. 9. Singapore: World Scientific; 1994 . p.167—97.

[13] Christoyianni L, Dermatas E, Kokkinakis G. Neural classifica-tion of abnormal tissues in digital mammography usingstatistical features of the texture. In: Proceedings of the6th IEEE international conference on electronics, circuitsand systems, IEEE, vol. 1; 1999. p. 117—20.

[14] Bovis K, Singh S, Fieldsend J, Pinder C. Identification ofmasses in digital mammograms with MLP and RBF nets. In:Proceedings of IEEE international joint conference on neuralnetworks, IEEE, vol. 1; 2000. p. 342—7.

[15] Verma B, Panchal R. Neural networks for the classification ofbenign and malignant patterns in digital mammograms. In:Fulcher J, editor. Advances in applied artificial intelligence.USA: Idea Group, Inc.; 2006.

[16] Markey M, Lo J, Tourassi G, Floyd C. Self-organizing map forcluster analysis of a breast cancer database. Artif Intell Med2003;27:113—27.

[17] Wu Y. Application of neural networks in mammography.Radiology 1993;187:81—7.

[18] Guran M, Chan H, Sahiner B, Hadjiiski L, Petrick N, Helvie M.Optimal neural network architecture selection: improve-ment in computerized detection of microcalcifications.Acad Radiol 2002;9(4):420—9.

[19] Guran M, Sahiner B, Chan H, Hadjiiski L, Petrick N. Selectionof an optimal neural network architecture for computer-aided detection of microcalcifications–—comparison of auto-mated optimization techniques. Med Phys 2001;28(9):1937—48.

[20] Papadopoulos A, Fotiadis DI, Likas A. An automatic micro-calcification detection system based on a hybrid neuralnetwork classifier. Artif Intell Med 2002;25:149—67.

[21] Lo JY, Land WH, Morrison CT. Application of evolutionaryprogramming and probabilistic neural networks to breastcancer diagnosis. In: Proceedings of IEEE internationaljoint conference on neural networks, IEEE, vol. 5; 1999.p. 3712—6.

[22] Lo JY, Land WH, Morrison CT. Evolutionary programmingtechnique for reducing complexity of artificial neuralnetworks for breast cancer diagnosis. In: Hanson KM, editor.Proceedings of SPIE (Society of Photo-Optical Instrumenta-tion Engineers), SPIE, vol. 3979. 2000. p. 153—8.

[23] Pena-Reyes CA, Sipper M. A fuzzy-genetic approach to breastcancer diagnosis. Artif Intell Med 1999;17:131—55.

[24] Cheng H, Lui YM, Freimanis RI. A novel approach to micro-calcification detection using fuzzy logic technique. IEEETrans Med Imag 1998;17:442—50.

[25] Yoshida H, Nishikawa RN, Geiger ML, Doi K. Signal/back-ground separation by wavelet packets for detection ofmicrocalcifications in mammograms. In: Unser MA, AldroubiA, Laine AF, editors. Proceedings of SPIE on wavelet applica-tions in signal and image processing, SPIE, vol. 2825. 1996.p. 805—11.

[26] Wang TC, Karayiannis NB. Detection of microcalcification indigital mammograms using wavelets. IEEE Trans Med Imag1998;17:498—509.

[27] Wroblewska A, Boninski P, Przelaskowski A, Kazubek M.Segmentation and feature extraction for reliable classifica-tion of microcalcification in digital mammograms. Opto-Electr Rev 2003;11:227—35.

[28] Lo JY, Gavrielides M, Markey MK, Jesneck JL. Computer-aidedclassification of breastmicrocalcification clusters:merging offeatures from image processing and radiologists. In: Sonka M,Fitzpatrick JM, editors. Proceedings of SPIE on medical ima-ging: image processing, SPIE, vol. 5032. 2003. p. 882—9.

[29] Bovis K, Singh S. Detection of masses in mammograms usingtexture measures. In: Proceedings of IEEE internationalconference on pattern recognition, IEEE, vol. 2; 2000.p. 267—70.

[30] Markey MK, Lo JY, Floyd CE. Differences between computer-aided diagnosis of breast masses and that of calcifications.Radiology 2002;223:489—93.

[31] Yu S, Guan L. A CAD system for the automatic detection ofclusteredmicrocalcifications in digitizedmammogram films.IEEE Trans Med Imag 2000;19:115—26.

[32] Heath M, Bowyer K, Kopans D, Moore R, Kegelmeyer Jr P. Thedigital database for screening mammography. In: Yaffe MJ,editor. Proceedings of international workshop on digitalmammography. Madison, USA: Medical Physics Publishing;2000. p. 212—8.

[33] Rangayyan RM, Mudigonda NR, Leo JE. Boundary modelingand shape analysis methods for classification of mammo-graphic masses. Med Biol Eng Comput 2000;38:487—96.

[34] Verma B. Fast training of multilayer perceptrons (MLPs).IEEE Trans Neural Netw 1997;8(6):1314—21.

[35] Sahiner B, Chan H, Petrick N, Helvie M, Goodsitt M. Design ofa high-sensitivity classifier based on a genetic algorithm:application to computer-aided diagnosis. Phys Med Biol1998;43:2853—71.

[36] Anastasio M, Yoshida H, Nagel R, Nishikawa R, Doi K. Agenetic algorithm-based method for optimizing the perfor-mance of a computer-aided diagnosis scheme for detectionof clustered microcalcifications in mammograms. Med Phys1998;25:1613—20.

[37] Lee Y, Tsai D. Computerized classification of microcalcifica-tions on mammograms using fuzzy logic and geneticalgorithm. In: Fitzpatrick JM, Sonka M, editors. Proceedingsof SPIE on medical imaging, SPIE, vol. 5370. 2004. p. 952—9.

[38] Khuwaja GA, Abu-Rezq AN. Bi-model breast cancer classifi-cation system. Pattern Anal Appl 2004;7:235—42.

[39] Shen L, Rangayyan RM, Desautels JEL. Detection and classi-fication of mammographic calcifications. Int J PatternRecogn Artif Intell 1993;7:1403—16.

[40] Masotti M. A ranklet-based image representation for massclassification in digital mammograms. Med Phys 2006;33(10):3951—61.

[41] Sahiner B, Petrick N, Chan H-P, Hadjiiski L, Paramagul C,Helvie M, et al. Computer-aided characterization of mam-mographic masses: accuracy of mass segmentation and itseffects on characterization. IEEETransMed Imag2001;20(12):1275—84.

[42] Sahiner B, Chan H-P, Petrick N, Helvie M, Hadjiiski L.Improvement of mammographic mass characterization usingspiculation measures and morphological features. Med Phys2001;28(7):1455—65.

[43] Looney C. Pattern recognition using neural networks. NewYork, USA: Oxford University Press; 1997.

[44] Kallergi M, Carney M, Gaviria J. Evaluating the performanceof detection algorithms in digital mammography. Med Phys1999;26(2):267—75.