an efficient scheme for lung nodule detection

AN EFFICIENT SCHEME FOR LUNG

NODULE DETECTION

Furqan Shaukat

11-UET/PhD-EE-51

Supervisor

Prof. Dr. Gulistan Raja

DEPARTMENT OF ELECTRICAL ENGINEERING

FACULTY OF ELECTRONICS & ELECTRICAL ENGINEERING

UNIVERSITY OF ENGINEERING AND TECHNOLOGY

TAXILA

April 2018

i

AN EFFICIENT SCHEME FOR LUNG NODULE

DETECTION

Author

Furqan Shaukat

11-UET/PhD-EE-51

A thesis submitted in partial fulfillment of the requirements for the degree of

Ph.D. Electrical Engineering

Thesis Supervisor:

Prof. Dr. Gulistan Raja

Electrical Engineering Department

UET Taxila

DEPARTMENT OF ELECTRICAL ENGINEERING

FACULTY OF ELECTRONICS & ELECTRICAL ENGINEERING

UNIVERSITY OF ENGINEERING AND TECHNOLOGY, TAXILA

April 2018

ii

DECLARATION

I certify that research work titled “An Efficient Scheme for Lung Nodule Detection” is my own

work. The work has not been presented elsewhere for assessment. Where material has been used

from other sources it has been properly acknowledged / referred.

Signature of Student

Furqan Shaukat

11-UET/PhD-EE-51

iii

DEDICATION

…to the loving memories of my Father who always had belief in me.

iv

ACKNOWLEDGEMENTS

First of all, I am very thankful to Almighty ALLAH who has given me the strength and courage

to work on the thesis.

Special thanks to my supervisor Prof. Dr. Gulistan Raja for his guidance and technical support

in the development of the thesis. It has been a long journey of relationship starting from my

master degree and I learnt from him the art of being committed to the task and being

professional. He has been very supportive and kind throughout this thesis.

I would also like to thank Prof. Alejandro Frangi for his continuous support and help during

my stay at CISTIB, Department of Electronic and Electrical Engineering, University of

Sheffield. It has been my privilege to work with him.

Last but not the least; I also respect the patience of my family who has suffered a lot during my

work on the thesis.

Furqan Shaukat

v

TABLE OF CONTENTS

DECLARATION ....................................................................................................................... ii

ACKNOWLEDGEMENTS ...................................................................................................... iv

TABLE OF CONTENTS ........................................................................................................... v

EXECUTIVE SUMMARY ....................................................................................................... 1

Chapter 1: INTRODUCTION.................................................................................................... 6

1.1 Research Background and Significance ........................................................................... 6

1.2 Lung Cancer and Nodules ................................................................................................ 8

1.2.1 Imaging Features and Analysis of Pulmonary Nodules CT ...................................... 8

1.3 Computer Aided Detection............................................................................................. 11

1.4 Organization of Thesis ................................................................................................... 13

Chapter 2: LITERATURE REVIEW....................................................................................... 16

2.1 Lung Segmentation ........................................................................................................ 17

2.1.1 Shape-Based Techniques ......................................................................................... 17

2.1.2. Edge Based Techniques .......................................................................................... 20

2.1.3 Thresholding Based Techniques .............................................................................. 22

2.1.4 Deformable Boundary Techniques .......................................................................... 23

2.2 Lung Nodule Detection .................................................................................................. 27

2.3 False Positive Reduction ................................................................................................ 31

2.4 Problem Statement ......................................................................................................... 40

Chapter 3: PROPOSED SCHEME FOR LUNG NODULE DETECTION ............................ 41

3.1 Lung Segmentation ........................................................................................................ 42

3.1.1 Lung Image Preprocessing ...................................................................................... 42

3.1.2 Lung Parenchyma Segmentation ............................................................................. 43

3.2 Image Enhancement and Nodule Detection ................................................................... 50

3.2.1 Theoretical Research on Image Enhancement Algorithm ....................................... 51

3.2.2 Multi-Scale Enhancement Algorithm Based on Hessian Matrix ............................. 52

3.3 Lung Nodule Detection and Classification .................................................................... 58

3.3.1 Rule-Based Analysis of Lung Nodule Candidates .................................................. 60

3.3.2 Feature Extraction .................................................................................................... 61

3.4 Classification of pulmonary nodules .............................................................................. 70

3.4.1 Support Vector Machine Classifier ......................................................................... 70

Chapter 4: RESULTS AND DISCUSSION ............................................................................ 76

vi

4.1 Dataset and Hit Criteria .................................................................................................. 76

4.1.1 DICOM Resources .................................................................................................. 77

4.2 Experimental Environment ............................................................................................ 78

4.2.1 Image Preprocessing Module .................................................................................. 78

4.2.2 Lung Segmentation Module .................................................................................... 79

4.2.3 Image Enhancement Module ................................................................................... 80

4.2.4 Lung Nodule Segmentation Module ........................................................................ 86

4.2.5 Lung Nodule Classification Module ........................................................................ 86

4.2.6 Supplementary Functions Module ........................................................................... 86

4.3 Classification Results of SVM with Different Kernel Functions ................................... 86

4.4 Classification Results of SVM with Different Kernel Scale and Penalty Factor ........... 91

4.5 Classification Results of Different Classifiers ............................................................... 93

4.6 Feature Ranking ............................................................................................................. 94

4.7 Comparison with Other Systems .................................................................................... 95

Chapter 5: CONCLUSION AND FUTURE PROSPECT ..................................................... 105

5.1 Conclusion .................................................................................................................... 105

5.2 Follow-up Work and Prospects .................................................................................... 105

REFERENCES ...................................................................................................................... 108

APPENDIX A ...................................................................................................................... 129

APPENDIX B ........................................................................................................................ 137

ABBREVIATIONS ............................................................................................................. 140

vii

LIST OF FIGURES

Figure 1-1: Sample images of four nodule groups (encircled). From left to right, well-

circumscribed, juxta-vascular, juxta-pleural and pleural tail nodules. ...................................... 8

Figure 2-1: Process of lung nodule detection consists of acquiring an image followed by lung

segmentation, nodule detection and false positive reduction or classification. ....................... 16

Figure 3-1: Flow Chart of the Proposed Method ..................................................................... 41

Figure 3-2: Lung Parenchymal Segmentation Flow Chart ...................................................... 44

Figure 3-3: Example images of lung volume segmentation, (a) to (e) from left to right

presenting input, thresholded, hole filled, lung segmented and contour corrected images,

respectively. ............................................................................................................................. 48

Figure 3-4: (a) Represents a parenchymal image (b) represents an image after repair of the lung

parenchyma (c) zoomed view of left and right lung contour separation. ................................ 50

Figure 3-5: Multi-scale circular filter enhancement algorithm flow chart .............................. 53

Figure 3-6: Example images showing results of image enhancement at different slices. (a) and

(b) shows a low-density nodule in red circle, which is detected after image enhancement where

(c) and (d) shows the other two slices after image enhancement. ........................................... 58

Figure 3-7: Examples of detected candidates (a) nodules (b) non-nodules. It can be seen that

nodule diversity and their close resemblance to other anatomic structures present in the lung

region make the task of detection more challenging and produces false positives, which are

being reduced with the aid of a classifier................................................................................. 61

Figure 3-8: Flow Chart of Classification Process .................................................................... 74

Figure 3-9: Pulmonary Nodule Classification Results ............................................................. 75

Figure 4-1: User Interface of Lung CAD ................................................................................. 78

Figure 4-2: Sample Output of Image Processing Module ....................................................... 79

viii

Figure 4-3: Sample Output of Lung Segmentation Module .................................................... 79

Figure 4-4: Sample Output of Image Enhancement Module ................................................... 80

Figure 4-5: Sample Results of Image Preprocessing, Segmentation and Enhancement Modules

on Top, Bottom and Middle Slices of Different Scans. ........................................................... 83

Figure 4-6: Grid Search Results for SVM -Gaussian. ............................................................. 88

Figure 4-7: Grid Search Results for SVM-Cubic .................................................................... 89

Figure 4-8: Grid Search Results for SVM-Quadratic .............................................................. 89

Figure 4-9: ROC curves of the SVM classifier with different kernel function using (a) 2-Fold

Scheme, (b) 5-Fold Scheme (c) 7-Fold Scheme. ..................................................................... 90

Figure 4-10: ROC curves of the SVM classifier with (a) different kernel scale 𝛾 values, varying

from 0.3 to 3 (b) with different penalty parameter c values varying from 1 to 4. ................... 93

Figure 4-11: (a) ROC curves of SVM classifier with different feature classes (b) ROC curves

of different classifiers .............................................................................................................. 95

Figure 4-12: Number of False Negatives with respect to Size ................................................ 98

Figure 4-13: Percentage of False Negatives with respect to Size ............................................ 98

Figure 4-14: Detection Sensitivity with respect to Nodule Size .............................................. 99

Figure 4-15: Comparison of System’s Overall Performance w.r.t. different agreement levels

................................................................................................................................................ 100

Figure 4-16: Sample Missed Nodules indicated by the red arrow (False Negatives) by the

proposed system. Encircled objects in respective figures represent False Positive ............... 100

Figure 4-17: Sample images of detected nodule (highlighted) by the proposed system (True

Positive). The arrow in respective figures indicates the False Positive detected by the system

along with True Positive. ....................................................................................................... 103

Figure 4-18: FROC curves of the proposed system with respect to the different kernel functions

of SVM classifier. .................................................................................................................. 104

ix

Figure A-1: Structure of Data Elements ................................................................................ 132

Figure A-2: DICOM Image Resolution Flow Chart .............................................................. 133

Figure A-3: DICOM File Binary Encoding ........................................................................... 135

Figure B-1: Structure of Pulmonary Nodules’ XML ............................................................. 139

x

LIST OF TABLES

Table 2-1: Review of Lung Segmentation Techniques............................................................ 25

Table 2-2 Review of Lung Nodule Detection Methods ........................................................... 30

Table 2-3: Review of Current CAD Systems. ......................................................................... 38

Table 3-1: Extracted Features of Nodule Candidates. ............................................................. 65

Table 3-2: Feature Correlation Information ............................................................................. 69

Table 4-1: Classification Results of SVM on test dataset with different kernel functions. ..... 87

Table 4-2: Classification Results of SVM-Gaussian on test dataset using different γ values. 91

Table 4-3: Classification Results of SVM-Gaussian on test dataset using different C values 92

Table 4-4: Classification Results of different classifiers on test dataset. ................................ 93

Table 4-5: Classification Results of SVM-Gaussian on test dataset using different feature

classes ..................................................................................................................................... 94

Table 4-6: Performance Comparison of Different CAD Systems. .......................................... 96

Table 4-7: Average Scores of Different Characteristics of Sample False Negatives ............ 102

Table A-1: Data elements of explicit VR for type OB, OW, OF, SQ, UT, or UN ................ 130

Table A-2: Data elements for other types of explicit VRs ..................................................... 130

Table A-3: Transfer Syntax Comparison Table ..................................................................... 133

1

EXECUTIVE SUMMARY

Lung cancer has been one of the major threats to human life for decades in both developed and

under developed countries with the smallest rate of survival after diagnosis. The survival rate

can be increased by early nodule detection. Computer Aided Detection (CAD) can be an

important tool for early lung nodule detection and preventing the deaths caused by the lung

cancer. In this dissertation, we have proposed a novel technique for lung nodule detection using

a hybrid feature set. The proposed method starts with pre-processing, removing any present

noise from input images, followed by lung segmentation using optimal thresholding. Then the

image is enhanced using multi scale dot enhancement filtering prior to nodule detection and

feature extraction. Finally, classification of lung nodules is achieved using Support Vector

Machine (SVM) classifier. The feature set consists of intensity, shape (2D and 3D) and texture

features, which have been selected to optimize the sensitivity and reduce false positives. In

addition to SVM, some other supervised classifiers like K-Nearest-Neighbour (KNN),

Decision Tree and Linear Discriminant Analysis (LDA) have also been used for performance

comparison. The extracted features have also been compared class-vise to determine the most

relevant features for lung nodule detection. The proposed system has been evaluated using 850

scans from Lung Image Database Consortium (LIDC) dataset and k-fold cross validation

scheme. The main research work done in this dissertation is summarized in the following

section.

1. The proposed method starts with the segmentation of lung volume from pre-processed input

CT images. Lung segmentation has a critical importance as it is pre-requisite to the nodule

detection. Any in-accurate lung volume segmentation can lead to the low accuracy of whole

system. In this dissertation, we propose a fully automated segmentation method for lung

volume from CT scan images which consists of series of steps. Initially, the CT image is

2

segmented by using optimal thresholding and the lung volume is obtained using connected

component labeling method and other irrelevant information is removed at this stage. The

resultant image at this stage contains holes which is filled with the hole filling algorithm e.g.

morphological operations. Finally, the lung contour is smoothed by rolling ball algorithm to

include any juxta pleural nodules.

2. After lung segmentation, image enhancement is done to detect the low-density nodules.

Image enhancement plays an important role in detection of these nodules by enhancing them

and reducing false positives by weakening the other structures in lung region. In this thesis, a

multi scale dot enhancement filter is used to detect these low-density nodules which may

remain undetected in the absence of any enhancement algorithm and can affect the accuracy of

the system. In the first step, a Gaussian smoothing on all the corresponding 2D slices is

performed to reduce the noise and sensitivity effect. After Gaussian smoothing, Hessian matrix

and its eigen values |𝜆2|<|𝜆1| are calculated for every pixel to determine the local shape of the

structure. The suspected pulmonary nodule region exhibits the form of a circular or oval object

whereas vascular tissue structures presents a line-like elongated structure. Therefore, this

property can be used to distinguish different shape structures present in lung region. This

process is repeated for different scales and finally we integrate the filter’s output values to

obtain the maximum value for the best enhanced effect and generate the resultant image. After

image enhancement, lung nodule candidates are detected using optimal thresholding. Then a

rule-based analysis has been made based on some initial measurements like area, diameter and

volume whether to keep or discard the detected nodule candidate. The advantage of rule-based

3

analysis is that it eliminates the objects which are too small or too big to be considered as a

nodule candidate and thus reduces the workload for the next stage.

3. A hybrid feature set is obtained after rigorous experimentation which increases the

classification accuracy and reduces the false positive per scan considerably. The proposed

feature set plays a crucial role in the overall performance of the CAD system. We selected a

large pool of features initially and then trimmed down the set on the basis of accuracy and false

positive per scan and ultimately obtained the proposed hybrid feature set.

4. The classification of pulmonary nodules is done using SVM algorithm. In the classification

phase, the suspected pulmonary nodules are divided into true pulmonary nodules and false

pulmonary nodules. SVM as a high-dimensional multi-feature hyperplane differentiation

algorithm performs considerably well in a situation where it must decide only between the two

classes i.e., nodule or non-nodule and the features of the suspected pulmonary nodules refer

mainly to the two classes and the Gaussian Radial Basis Function (RBF) kernel function can

increase its linear separability which makes the detection and classification of pulmonary

nodules more accurate.

5. We have done an extensive evaluation of our proposed system on Lung Image Database

Consortium (LIDC). LIDC is a publicly available database accessible from The Cancer

Imaging Archive (TCIA). We have considered the 850 scans (LIDC-IDRI-0001 to LIDC-IDRI-

0844) of this dataset, which contains nodules of size 3-30 mm fully annotated by four expert

radiologists in two consecutive sessions. K-fold cross-validation scheme is used for model

selection and validation whereas the k value varies for 5, 7 and 10. An exhaustive grid search

has been used to tune the hyperparameters of SVM classifier. Some other classifiers have also

been used for classification of lung nodule candidates. An attempt has also been made to

determine the most relevant feature class for lung nodule detection system. The achieved

4

sensitivities at detection and classification stages are 94.20% and 98.15%, respectively, with

only 2.19 FP/scan. The results of our proposed method show the superiority of our scheme as

compared to other systems with increased sensitivity and reduced FP/scan.

The main contribution of this dissertation is the presentation of a relatively simple nodule

detection scheme that has a very good performance in an extensive experimental analysis. In

addition, the proposed feature set has helped in reducing the false positives significantly and

has increased the sensitivity of the proposed system. Moreover, a comparison has been made

to determine the most relevant feature class in extracted feature set. The overall sensitivity has

been improved compared to the previous methods and FP/scan have been reduced significantly.

5

PERTINENT PUBLICATIONS

Article published in journal

[1] F. Shaukat, G. Raja, A. Gooya, and A. F. Frangi, “Fully automatic detection

of lung nodules in CT images using a hybrid feature set,” Med. Phys., vol. 44, no.

7, pp. 3615–3629, Jul. 2017.

Articles under review

[1] F. Shaukat, G. Raja, “Computer Aided Detection of Lung Cancer Nodules:

A Review,” (Submitted)

[2] F. Shaukat, G. Raja, “Artificial Neural Network based Classification of

Lung Nodules in CT Images Using Shape and Texture Features,” (Submitted)

6

Chapter 1: INTRODUCTION

1.1 Research Background and Significance

Lung cancer is one of the leading causes of the deaths around the world with the smallest rate

of survival after diagnosis. The survival rate can be increased by early nodule detection [1]. It

is found in both developed and under developed countries [2]. According to an estimate,

225,000 people are diagnosed with lung cancer every year in United States costing $12 billion

in health care [1][3]. Another study shows that 433 americans die of lung cancer every day [4].

The deadliest year in terms of the mortalities caused by the lung cancer proved to be 2005, with

the staggering figure of 159,292. Though, there has been a mild descent since then by 2.3%

with mortalities accounting of 155, 610 in 2014. Men have been the major victim of this disease

with the higher age-adjusted rate of 51.7 per 100,000 persons as compared to women in which

it is 34.7 per 100,000 persons. It has almost same rates with black and white women while

black men have a higher rate (45.7 per 100,000 persons) as compared to white men (45.4 per

100,000 persons) [5].

The situation in under developed countries is more worse. Lung cancer is the most common

type of cancer in Asia with the highest risk in South East Asia [6]. According to an estimate,

the lung cancer mortalities in Asia rise up to the alarming figure of 926,436 out of 1,033,881

cases in 2012 with a dismal survival rate of 10.4 % [7]. Another study shows that 51% of the

lung cancer cases occur in Asia [8]. Pakistan has also been the victim of lung cancer with the

danger increasing every passing day. According to a study conducted in Jan 2014, the lung

cancer occurrence and mortalities both are increasing in the country. Lack of awareness, poor

hygienic conditions and meat consumption are the other main reasons apart from tobacco which

is the primary source of this deadly disease [9].

7

World Health Organization’s (WHO) latest data released in November 2014 states that

about 14 million new cancer cases were diagnosed with 8.2 million cancer deaths in 2012, in

which lung cancer was on the top of list with 1.59 million cases followed by 745,000 cases of

liver cancer. Lung cancer deaths were more than 113 percent of liver cancer with the

increase in each year, so the condition is one of the most worried about the treatment of lung

cancer patients. According to WHO statistics, the number of lung cancer deaths were 652,842

in which men accounted for 70.3% and women accounted for 29.7% with lung cancer

becoming the first cause of mortality in female rather than the breast cancer. An estimate

suggests that in 2020, the world population will reach 8 billion and the number of new cancer

cases will reach 20 million. The death toll will reach 12 million, in which the lung cancer with

the highest mortality rate will be the biggest threat to human health [8].

Another important factor which makes the lung cancer most deadly is its lowest five-year

survival rate (17.7 %) as compared to other leading cancers like colon (64.4 %), breast (89.7%)

and prostate (98.9 %) [1]. The importance of early detection increase with the fact that the

survival rate of localized disease (cancer within the lungs) is 55 % but unfortunately the rate of

early detection is very low with a disappointing figure of 16 % only which becomes even worse

(only 4 %) when cancer spreads to other organs [4]. Estimates suggest that by 2030, lung

cancer will reach around 10 million deaths per year [2].

Keeping in view the present situation, a new initiative called ‘the Cancer moonshot’ was taken

in 2016, to boost the research in the prevention, diagnosis and treatment of cancer and achieve

the progress of a decade in just five years [10]. According to a study, 20% of the lung cancer

can be reduced by its early detection [11].

8

1.2 Lung Cancer and Nodules

The main reason for lung cancer is the formation of cancerous nodules in lung region or lung

periphery. Nodules can be defined as lung tissue abnormalities having a roughly spherical

structure and diameter of up to 30 mm [12,13]. They can be classified into the following

categories: well-circumscribed, juxta-vascular, juxta-pleural, and pleural tail. Well-

circumscribed nodules are solitary nodules having no attachment to their neighboring vessels

and other anatomical structures. Juxta-vascular nodules show strong attachment to their nearby

vessels. Juxta-pleural nodules are found to have some attached portion to the nearby pleural

surface. Pleural tail nodules, having a tail which belongs to the nodule itself, show minute

attachments to nearby pleural wall [14]. Sample images of different nodule groups can be seen

in Figure 1-1. In the following section, we have analyzed different characteristics of lung

nodules which play a key role in their detection.

Figure 1-1: Sample images of four nodule groups (encircled). From left to right, well-

circumscribed, juxta-vascular, juxta-pleural and pleural tail nodules.

1.2.1 Imaging Features and Analysis of Pulmonary Nodules CT

The complexity and diversity of the lung nodules with different types makes the task of

detection quite difficult. In the process of imaging diagnosis, the features of pulmonary nodules

can help to infer the nature and type of cancer such as benign/ malignant and primary/secondary

bronchial lung cancer and differentiate from other pathological diseases to make the

appropriate qualitative diagnosis. The following section describes the key imaging features of

lung nodules and their diversity which makes the task of detection quite complicated.

9

(1) Lung nodule size: The diameter of pulmonary nodules is one of the important and primary

indexes to judge benign and malignant nodules like other diseases. The size of suspected

lesions can directly reflect the pathology and can be the most intuitive and easiest way to judge.

Pulmonary nodules in accordance with the size can be divided into three categories (i) Large

nodules (ii) Small Nodules (iii) Very Small or Micro Nodules. Large nodules normally have

the diameter of 20 mm to 30 mm. The image of large nodules is more obvious and they can be

detected easily. Diameter of 10 mm to 20 mm interval is of small nodules. Small nodules are

relatively more difficult to detect with a greater chance of missing the true nodule or false

detection as compared to large nodules. Diameter of 2 mm to 10 mm is of the micro-nodules

(very small nodules) [15]. To detect these type of nodules, precise segmentation, enhancement

and the combination of classification can be used. However, it can be ascertained that the larger

the nodules, the higher the probability of malignancy and vice versa.

(2) Location of lung nodules: The course of lung cancer is to obstruct the blood vessels or

bronchi in the nodule area and then cause the insufficiency of oxygen supply to the lungs and

ultimately causing a human death. Therefore, the nodular position can convey to the doctor

some regular information. The nodule position rule is also one of the important index of the

image diagnostics. The nodules can be divided into four types, solitary nodules (solitary

pulmonary nodules), pleural adhesions nodules (juxta-pleural and pleural-tail nodule) and

vascular adhesions nodules (juxta-vascular nodule) [14]. The characteristics of these four types

of nodules are different. Solitary nodules remain in the parenchymal area of the lungs and the

density of the surrounding tissue is different from that of the bronchus however this type of

nodule can be easily confused with the vascular section. Pleural adhesion nodules show that

nodules and pleura have contact. According to the degree of contact, they can be further divided

into two types. The degree of adhesion will be more difficult for doctors to judge, with too

much affixed to the pleura will be mistaken for the extra pleural region and too far away from

10

the pleura will have the possibility to misjudge the blood vessel. While the vascular adhesion

nodule is attached to the adjacent vessel which is similar to the pleural adhesion nodule and is

difficult to detect and diagnose in the adjacent tissue structure. Each of these types of nodules

has its own detection difficulty so we should combine various methods to detect them precisely.

(3) Pulmonary nodule density: The CT values of lung tissue structure are different in different

tissues but the pulmonary nodules and pulmonary parenchymal blood vessels have almost the

same density. Depending on the density value, the nodules can be divided into solid-nodule,

part-solid nodule and non-solid nodules. Among them, the ground glass opacity nodules (GGO)

are more difficult to detect because they have relatively low density, small distribution area

and show a blurred fuzzy shape. Since this kind of fuzzy nodules display benign traits such as

their growth rate is slow. Hence, computer-aided detection system often finds it difficult to

detect these types of nodules but in general this type of nodule is attributable to malignant.

(4) Pulmonary nodule edge: The edge contour of pulmonary nodules is also one of the

important indexes to judge benign and malignant nodules. Usually, benign pulmonary nodules

have a smooth edge with no obvious lobulation and burr phenomenon whereas malignant

pulmonary nodules are different with irregular edges, burrs or lobes phenomena.

In summary, characteristics of pulmonary nodules are diversified and the difficulty of detection

is explained in these aspects. The different features of pulmonary nodules like the unevenness

of the edge contour with the burr signs extending to the periphery, scattered and without

branching with a long peach tip as well as the extended and vague hyperemia of the surrounding

area is illustrated by four types mentioned above. Further, the shape of the pulmonary nodules

is different. The contour surface is uneven and it is connected with similar tissues such as blood

vessels. The lung nodule can also vary in size from very large to small which makes it difficult

11

to determine whether the area is a blood vessel or a nodule and it is also a major problem in the

detection process.

1.3 Computer Aided Detection

Computer Aided Detection (CAD) can help in early lung nodule detection. In

radiology, computer-aided detection, are procedures in medicine that assist doctors in the

interpretation of medical images [16]. Because of the rapid growth and increase of medical

imaging technologies, the importance of CAD has emerged seriously. Medical imaging allows

scientists and physicians to collect potentially life-saving information by peering noninvasively

into the human body. With medical imaging playing an increasingly important role in the

identification and treatment of disease, the medical image analysis community has become

preoccupied by the demanding problem of extracting, with the assistance of computers,

clinically useful information regarding anatomic structures imaged through CT, MRI, PET, and

other modalities. Although modern imaging devices provide excellent views of internal

anatomy, the use of computers to enumerate and examine the embedded structures with

accuracy and effectiveness is restricted. To support the spectrum of biomedical investigation

and medical activities from diagnosis, to radiotherapy, to surgery, accurate, repeatable,

quantitative data must be efficiently extracted. So, the main idea of CAD is the extraction of

interest regions with high accuracy [17,18]. CAD meets three main objectives:

Improve the quality of diagnosis

Increase therapy success by early detection of cancer

Avoid unnecessary biopsies.

With the development of medical imaging standards, different imaging modalities have been

used including X-ray, CT (Computer Tomography), MRI (Magnetic Resonance Imaging),

12

PET (Positron Emission Tomography) where X-ray is the oldest imaging modality. Computed

Tomography provides more detailed and accurate information of anatomic structure and has

greatly improved the detection rate of lung cancer nodules. Therefore, CT has become the most

effective method of diagnostic tests to detect lung cancer [19].

At present, multi-slice spiral CT [20] can detect pulmonary nodules even up to the diameter of

1 mm. Multi-slice spiral CT has 16 rows, 32 rows, 64 rows of spiral CT. 64 rows spiral CT

can produce hundreds of medical images. The range of medical images can vary from 150 to

200 and can go to as many as a thousand. A lot of repetitive work is to be done to read all these

images which is a great burden on the doctor and can increase the chances of misdiagnosis due

to the doctor's fatigue. To add to this misdiagnosis, lung has quite complex structure with

vascular and bronchial tissue structure making it more complicated. Vascular and bronchial

tissue structures’ close resemblance to the shape of pulmonary nodules like circular structure,

pulmonary tuberculosis of various types, with lobulation, calcification and other forms of

expression, as well as adhesion of blood vessels, adhesions in the bronchus and solitary nodules

and their diversity increase the difficulty of diagnosis.

CAD can be used to assist doctors and share a lot of repetitive work and can improve diagnostic

efficiency and accuracy. CAD systems help scan digital images, e.g. from computed

tomography, for typical appearances and to highlight conspicuous sections, such as possible

diseases. Currently it is being done manually where the results may vary due to the increasing

workload, the margin of human error and negligence. This is a research area that has generated

a great deal of interest. CAD system has integrated the knowledge of image processing

technology, pattern recognition, machine learning and data mining technology with the

experience of doctors' lung cancer clinical information, automatic information extraction and

comparison work to judge the symptoms of patients. With the development in technology,

CAD system can be used to serve as the second opinion to the doctor for diagnosis [21].

13

Because doctors are concerned about the location of the main pulmonary parenchyma lesions,

and lesions embodied in the form of lung nodules, therefore the aim of lung CAD system is the

automated segmentation of lung parenchyma from CT image of patients and then to extract

within the lung parenchyma, the region of interest also known as suspected lesion area and get

lung nodules from the region of interest to determine whether the region is malignant, in

combination with additional diagnostic information and physician experience to make the final

diagnostic conclusion. Experimental results show, CAD system can improve the accuracy of

lung cancer diagnosis by at least 15% [22].

Due to the different five-year survival rate of lung cancer, it is important to get the early

identification of lung cancer diagnosis and early treatment. However, there are several issues

that a hospital can face including (1) Hospital costs (2) Training of experienced radiologist (3)

Diagnosis of doctor. With the rapid development of computer technology, CAD system can be

introduced in medical diagnostic process. Application of CAD in the detection

of pulmonary nodules have improved the sensitivity and specificity. The application of CAD

in screening for lung cancer disease can quickly help in low-term diagnosis work [21].

Currently, there has been a lot more research in the field of CAD. With the rapid development

in computer technologies and the urgent needs of medical diagnosis, CAD system has

gradually become one of the hot topics of research in the medical industry and universities of

Europe and United States of America. Some of the CT based products which are in commercial

use are R2 Technology's 2004 FDA-Certified Image Checker CT [23]. Siemens’ syngo.CT

Lung CAD [24] and Veolity by Mevis Medical Solutions [25].

1.4 Organization of Thesis

In this dissertation, we have proposed a novel computer aided detection scheme for lung

nodules using a hybrid feature set which increases the overall sensitivity of the system and

14

reduces the false positive per scan. This research focused on DICOM (Digital Imaging and

Communications in Medicine) image segmentation of lung parenchyma followed by nodule

candidate detection and classification of candidate nodules into nodules and non-nodules.

Initially, image preprocessing is applied to remove any present noise from the input CT image.

Then a series of operations are performed to segment the main lung region, reducing the

viewing area for the doctors. After this, the image is enhanced which results in better

visualization of region of interest (ROI). Next, the candidate nodules are detected and the false

positives have been reduced using classifier. The main contents of this dissertation are:

The first chapter is introduction. The background and significance of the subject is described,

different types of lung nodules and their characteristics, the importance of CAD in lung cancer

detection and the research being carried out is presented. Finally, the organization of this thesis

is presented in this chapter.

The second chapter presents a detailed literature review of lung nodule detection methods. The

lung nodule detection method normally consists of image acquisition, lung segmentation,

nodule detection and classification techniques. In this chapter, we have discussed the salient

techniques present in the literature and analyze their advantages and disadvantages and

conclude this chapter with the problem statement which motivated our work in this thesis.

The third chapter describes in detail the proposed methodology which consists of automatic

lung segmentation from input CT image and removal of background image. The segmented

lung image is smoothed using morphological operations to include any juxta pleural nodules

present in the lung region. After this, the image is enhanced using multiscale dot enhancing

filter based on Hessian matrix. Using this enhancement technique, we have achieved a good

reinforcing effect to all types of nodules to a certain extent. Next the candidate nodules are

detected using optimal thresholding on dot enhanced images and a rule-based analysis is

15

applied to filter only the good nodule candidates. A hybrid feature set is extracted from the

nodule candidates and SVM classifier is used to reduce the false positives and classify the

nodules into nodules and non-nodules.

The fourth chapter presents the results and discussion section. We have done an extensive

evaluation of our proposed system on Lung Image Database Consortium (LIDC). LIDC is a

publicly available database accessible from The Cancer Imaging Archive (TCIA). We have

considered the 850 scans (LIDC-IDRI-0001 to LIDC-IDRI-0844) of this dataset, which

contains nodules of size 3-30 mm fully annotated by four expert radiologists in two consecutive

sessions. The overall sensitivity has been improved compared to the previous methods and

FP/scan have been reduced significantly. Finally, the conclusion and recommendations of the

dissertation are presented in Chapter 5, which provides the basis for future research.

16

Chapter 2: LITERATURE REVIEW

This Chapter presents the detailed literature review of computer aided lung nodule detection

schemes. We have divided the review in three sections. With a brief introduction of image

acquisition and the commonly available datasets, first section mainly presents the lung volume

segmentation techniques reported in literature. Second section presents the lung nodule

detection techniques reported in literature. A brief review of the related work (methods based

on nodule classification and feature extraction) highlighting the challenges which have

motivated our work in this dissertation is presented in the last section of this chapter. Computer

Aided Detection (CAD) can play an important role in aiding early detection of the cancer and

increasing the detection sensitivity [5,26].

Figure 2-1: Process of lung nodule detection consists of acquiring an image followed by lung

segmentation, nodule detection and false positive reduction.

A complete diagram for the typical lung CAD process is shown in Figure 2-1. The steps

involved in this process are briefly explained below. Image acquisition can be defined as a

process of acquiring medical images from imaging modalities [13]. Many common methods

are available for lung imaging. Computed Tomography (CT) stands out as a key imaging

modality compared to other lung imaging methods for the primary analysis of lung nodules

screening. The Lung Image Database Consortium (LIDC) [27] stands out among the available

public databases due to the standard radiological annotations provided with the images and its

AcquisitionAcquisition

Lung Field Segmentation

Lung Field Segmentation

Nodule Detection

Nodule Detection

False Positive Reduction

False Positive Reduction

17

widespread use. Others databases are, Early Lung Cancer Action Program (ELCAP) Public

Lung Image Database [28] and ELCAP Public Lung Database to Address Drug Response [29].

2.1 Lung Segmentation

Lung segmentation can be defined as the process of extracting the lung volume form input CT

image and removing the background and other irrelevant components. Lung segmentation

serves as a prerequisite to the nodule detection. Accurate lung segmentation plays an important

role to enhance the efficiency of lung nodule detection system. Numerous methods have been

proposed in literature for the extraction of lung volume from CT image such as optimal

thresholding, rule-based region growing, global thresholding, 3-D-adaptive fuzzy thresholding,

hybrid segmentation, and connected component labeling. After the initial segmentation, juxta-

pleural nodules are included by refining the extracted lung volume. To do this, a chain-code

method, a rolling ball algorithm, and morphological approaches have been generally used [30-

38].

Lung segmentation techniques can be broadly classified into four categories (i) Shape Models

(ii) Edge-based techniques (iii) Thresholding (iv) Deformable Boundaries. In the following

section, we have reviewed the selected studies from each of these categories used in lung

segmentation.

2.1.1 Shape-Based Techniques

This section presents a group of papers which have used shape-based techniques for lung

segmentation. In 2005, Sluimer et al. [34] developed an automatic shape based lung

segmentation method. In this scheme, the segmentation was done based on registration. In this

scheme, the pathological scan was elastically registered with normal scan. The proposed

method was evaluated using 26 three-dimensional thin-slice CT scans in which 10 scans with

high-density pathology were used as test data and the results were compared with the ground

truth of manual traced contours. The overlap measure of 0.8165 with 1.48 mm as the mean

18

absolute surface distance was achieved using the proposed methodology.

In 2011, Besbes and Paragios [39] proposed a graph-based shape model with image cues

based on boosted features for automatic lung segmentation. The constraints were prior encoded

using the Normalized Euclidian Distance between pairs of control points and graph topology

was deduced using manifold learning and unsupervised clustering. The task of segmentation

was divided into a task of labelling where the extracted image points were matched to model

landmarks. An additional label for outliers is added to overcome the limitations of missing

correspondence and the outliers are then repaired to complete the segmentation. The proposed

method was evaluated for the segmentation of right lung using the publicly available dataset of

247 chest radiographs. The ground truth was available in the form of gold standard

segmentation of the organ by the expert radiologists. The overlap measure of 0.9474 with the

mean absolute surface distance of 1.39 pixel was achieved using the proposed methodology.

In 2011, Sofka et al. [40] proposed a multi-stage learning method combining anatomical

information to predict the initialization of a statistical shape model of the lungs. Initialization

first detects the base of the trachea and uses it to automatically select a stable landmark on the

area near the lungs, such as ribs and spine. These landmarks are used to align shape models

and then refine through boundary detection to obtain fine-grained segmentation. Robustness is

achieved using discrimination classifiers, in hierarchical fashion that are trained on manual

annotate data of disease and healthy lungs. The proposed method was evaluated on 260 scans

and compare the results with the ground truth of 68 manual tracing of contours by expert

radiologist. The symmetrical point-to-mesh comparison error (SCD) of the proposed algorithm

was 1.95.

In 2012, Sun et al. [41] proposed a robust active shape model (RASM) for automatic lung

segmentation. The method consisted of two steps. Initially, the lung contours were roughly

segmented with the robust active shape model in which initial position was found with the help

19

of rib cage detection method. The segmentation was refined by the means of optimal surface

finding approach in second step. The right and left lungs were separated individually. The

proposed method was evaluated on 30 scans with 20 healthy and 40 diseased right/left lungs

and the results were compared to the ground truth of manually traced contour by the experts.

The dice similarity coefficient of the proposed method was 0.975 with mean absolute surface

distance error of 0.84 mm.

In 2014, Mansoor et al. [42] proposed a novel pathological lung segmentation method that

consisted of two stages. In first stage, the fuzzy connectedness was applied to segment the

lungs using rib cage information in parallel to estimate the lung volume. Then the two lung

volumes were compared to get the idea of any pathology present in result of any difference

between the two lung volumes. In second stage texture based features were computed to refine

the lung segmentation and include any missed abnormalities present in first stage. In addition,

a neighboring anatomy based approach was selected to include the low density weak abnormal

structures and juxta pleural nodules. The proposed method was evaluated with publicly

available and private datasets. The method produced an average overlap score of 95% on

private dataset with 400 CT scans with 96.84 % sensitivity and 92.27 % specificity. To validate

the results, it was also evaluated with the publicly available challenge dataset, Lobe and Lung

Analysis 2011 (LOLA11) and achieved the mean overlap score of 0.955 on the challenge

dataset consisting of 55 scans for right and left lung separately.

In 2015, Dai et al. [43] proposed lung segmentation method using Graph cut algorithm and

Gaussian mixture model. The proposed method denied the need of any post processing

techniques of lung contour smoothing like morphological operations and rolling ball algorithm

etc. The proposed method started with the modelling of foreground and background object as

GMM models and the expected maximization (EM) algorithm is used to calculate the weight

of each pixel belonging to foreground. These weights served as nodes and edges of the

20

corresponding graph and the segmentation was completed with the minimum cut theory. The

proposed method was evaluated on chest CT images provided by General Hospital of Ningxia

Medical University and the results were compared with the manual ground truth by expert

radiologists. The proposed method achieved the mean dice similarity coefficient index of

0.9874 with a standard deviation of 0.0070 for the CT images.

In 2017, Soliman et al. [44] proposed a shape based automatic lung segmentation method.

The method employed the technique of adaptive appearance- guided shape modelling. The

proposed method consisted of two visual appearance submodels and an adaptive shape

submodel which adds together to create the spatial inhomogenous 3D Markov-Gibbs random

field (MGRF). Local and global signal properties are specified by the filtered version of input

signal and its Gaussian scale space (GSS) which is done by the first order visual submodel.

Linear combination of discrete Gaussians (LCDG) was used to approximate the empirical

probability distribution of each signal in their close accordance. The approximations were

separated in two linear combination of discrete Gaussians representing the lungs and

background. The second order visual submodel used to quantify the intensity dependencies of

both the original and GSS-filtered images. Shape submodel is used for training dataset to adapt

the shape appearance during the segmentation phase. The proposed method was evaluated on

three different datasets including one private and two publicly available datasets. The private

dataset consisting of 30 CT scans whereas the two publicly available datasets “VESSEL” and

LOLA11 consisted of 20 and 55 CT scans respectively and the results were compared to the

ground truth. The overlap ratio of 0.98 was achieved using the proposed methodology.

2.1.2. Edge Based Techniques

Following section presents studies which have used edge based techniques for lung

segmentation. In 2004, Mendonca et al [45] proposed an automatic method for 2D lung

segmentation using edge detection technique in spatial domain. The lung segmentation was

21

completed in two stages. In first stage, the two region of interest were determined with each

one indicating a lung field. In the second phase, the ROI was analyzed for accurate detection.

For this purpose, the image was smoothed first with an averaging filter of size 9*9. The

proposed method was evaluated on 47 chest radiograph images. The results were compared

with the ground truth obtained by manual tracing of lung borders by an experienced radiologist

and the achieved sensitivity was 92.25%.

In 2005, Yim et al. [46] proposed a lung volume segmentation method based on region

growing and connected component labelling. The proposed method started with the extraction

of lung region and air ways via inverse seeded region growing and connected component

labeling. Then the trachea and air ways were delineated from the lungs by three-dimensional

region growing. Median filtering was used in preprocessing and morphological operations were

applied in post processing stage in second step. Finally, the lung contours were extracted by

subtracting the resultant of second step from first step. The proposed method was evaluated on

10 subjects and the results were compared with the ground truth of 10 manually traced contours.

The root mean square difference between the proposed method and ground truth was 1.2 pixel.

In 2006, Campadelli et al. [47] developed an automatic lung segmentation method using

spatial edge detector. After segmentation, the image was enhanced using the multi scale method

to increase the visibility of nodules. Support vector machine (SVM) classifier with Gaussian

and Polynomial kernel functions was used to reduce the false positives. The proposed method

was evaluated using a large set of postero-anterior (PA) chest radiographs and the results were

compared with the ground truth. The system achieved the highest sensitivity of 92% with 8

FP/image.

In 2007, Korfiatis et al. [48] proposed an automatic lung segmentation method based on

wavelet edge detector. The proposed method used the two-dimensional wavelet edge

highlighter as the preprocessing step to delineate the lung contour. After outlining the lung

22

contours, the lung volume is extracted using the three-dimensional gray thresholding with

minimum error technique. After lung volume extraction, 3D morphological closing is applied

using a spherical structuring element to deal with the mediastinum border. The proposed

method was evaluated using LIDC dataset of 23 scans and the results were compared with the

manual tracing of lung contours. The overlap measure of 0.983 with 0.77 mm as the mean

absolute surface distance was achieved using the proposed methodology. The root mean square

difference between the proposed method and ground truth was 0.52 mm.

2.1.3 Thresholding Based Techniques

The following section summarizes the techniques based on thresholding reported in literature.

In 2001, Hu et al. [49] developed a fully automatic lung segmentation technique using the

iterative threshold method. Optimal threshold value was obtained using an iterative approach.

Initially, a threshold value was selected which was iterated until there was no change in the

value. This method takes the advantage of the fact that different structures in Lung CT images

have different densities. After segmentation, the left and right lung was separated using

dynamic programming. Next, the segmented lung volume was smoothed using morphological

operations. The proposed algorithm was evaluated on eight 3D CT scans. The results were

compared to manually traced borders from two expert radiologists. The root mean square error

averaged over all the volumes between the automated lung segmentation and manually traced

border was 0.8 pixel (0.54 mm).

In 2007, Gao et al. [50] proposed an accurate and fully automated lung segmentation technique

based on thresholding. Initially, the large airways were removed from the input CT image by

anisotropic diffusion to smooth edges and region growth. Then optimum thresholding

technique was used to the segment the lung volume and left and right lungs were separated

using tracking algorithm. Finally, the lung contour was smoothed using rolling ball algorithm.

The proposed method was evaluated using eight CT scans of four patients and the results were

23

compared with the manual tracing of lung contours. The Dice similarity coefficient of the

proposed method was 0.9946.

In 2016, Shi et al. [51] proposed an automatic lung segmentation based on thresholding. The

method consisted of series of different steps. Initially, the input CT image was filtered to

remove any present noise using guided filter. Then the image was thresholded using Otsu

thresholding. The thorax region was extracted using region growing method and seed-based

random walk algorithm was used to segment the lung region from thorax. Finally, the image

was smoothed to include juxta-pleural nodules using curvature based correction method. The

proposed method was evaluated with 23 scans consisting of 883 2D slices and the results were

compared with the manually traced ground truth by expert radiologists. The overlap ratio of

98.4 % was achieved using the proposed methodology.

2.1.4 Deformable Boundary Techniques

This section presents a group of papers which have used deformable boundary models for lung

segmentation. In 2008, Shi et al. [52] proposed shape based deformable model for automatic

lung segmentation. The deformable model used the scale invariant feature transform (SIFT)

which is more descriptive feature as compared to other classes like intensity and gradient.

Second, both population-based and patient-specific shape statistics were used to segment the

lung fields from the chest radiographs which yields better and robust results. In this paper,

hierarchical PCA was used as compared to global PCA in learning patient specific shape

statistics phase. The advantage of using hierarchical PCA is that it increases degree of freedom

and can capture the shape diversity more accurately even with the small number of learning

samples. The proposed algorithm was evaluated in two folds. First the proposed algorithm was

evaluated (without patient-specific shape statistics) on Japanese Society of Radiological

Technology (JSRT) database of 247 chest radiograph images. In the second phase, the complete

algorithm (with patient-specific shape statistics) was evaluated using the 39 serial frontal chest

24

radiographs. The results were compared to the ground truth of manually traced images. The

overlap measure of 0.92 with the mean absolute surface distance of 1.78 pixel was achieved

using the proposed methodology.

In 2008, El-Baz et al [53] proposed a statistical Markov-Gibbs random field (MGRF) model

based fully automatic lung segmentation method. Linear Combination of Discrete Gaussians

(LCDG) with positive and negative components was used to better approximate the empirical

distribution of every signal. The conventional Expectation-Maximization (EM) algorithm was

modified to deal with the LCDG. The proposed method was evaluated on ten different real

datasets and the results were compared with the ground truth of 1820 manually traced images.

The system achieved an accuracy of 96.8%.

In 2010, Annangi et al. [54] developed a shape based deformable model for automatic lung

segmentation. The issue of local minima while using active contours for lung volume

segmentation was treated with multi-scale feature set which was achieve due to the good

contrast presents on lung boundary. In the feature computation phase, edge map was obtained

using Canny edge detector applied on histogram equalized image. The proposed method was

evaluated on 1130 images with the ground truth marked by expert radiologists. The Dice

similarity coefficient of the proposed method was 0.88 with a standard deviation of 0.07.

In 2017, Filho et al. [55] proposed an automatic lung segmentation method using Adaptive

Crisp Active Contour Method (3D ACACM). Initially a sphere was placed within the lung

which was deformed by the forces acting outwards. The minimization energy function was

calculated in an iterative manner to be used in the deformable model. The main contribution of

this method was the calculation of 3D Adaptive Crisp external energy which was used to detect

the origins of edges in lungs and the 3D Adaptive Balloon internal energy which was used to

expand the scope of segmentation. The topology of each point and the information of

neighboring slices were used to calculate this force. A robust 3D automatic initialization

25

technique was also proposed in this method which automatically initialized the seed points in

right and left lungs. The proposed method was evaluated with the 40 CT scans and the results

were compared with the commonly used approaches like 3D region growing, level set

algorithm and the semi-automatic segmentation by an expert. The proposed method achieved

the F-measure of 99.14% ±0.18 where F-measure (FM) denotes the harmonic mean of

predictive value and sensitivity. We have summarized the lung segmentation techniques in

Table 2-1.

Table 2-1: Review of Lung Segmentation Techniques1

1 * NA means Not available, OM means overlap measure and is defined as the volume of the intersection divided

by the volume of the union of two samples, DSC means Dice similarity coefficient and is used for comparing the

similarity of two samples, FM means F-measure and denotes the harmonic mean of predictive value and

sensitivity, RmsD means the root mean square difference of the Distance between the segmentation and the ground

truth, SCD means symmetrical point-to-mesh comparison error, AD means the mean absolute surface distance

and is defined as symmetric border positioning measure integrated along the entire surfaces.

CAD

Systems

Year No. Cases Image size Proposed

Technique

Ground

Truth

Performance

Soliman et

al. [44] 2017 105

512×512×270

-450 Shape-based

75 Manual

traced scans

OM= 0.98

DSC= 98.4

%

Filho et al.

[55] 2017 40 CT scans 512 * 512

Shape-based

deformable

model

Semi-

automatic

(manual +

commercial

software)

FM =

99.14%

Shi et al.

[51] 2016 23 CT scans 512 * 512 Thresholding

23 manually

traced data OM= 0.984

Dai et al.

[43] 2015 NA

512 * 512*

368 Shape-based

Manually

traced data

DSC=0.987

4

Mansoor et

al. [42] 2014

400 CT

images NA Shape-based

400 manually

traced data OM=0.955

Sun et al.

[41] 2012 30 scans

512 × 512 ×

424–642,

0.6–0.7mm

thin

Shape-based

30 manually

corrected

traced data

DSC =

0.975

AD = 0.84

mm

Sofka et al.

[40] 2011 260 scans 0.5–5.0mm Shape-based

68 manual

traced data SCD = 1.95

Besbes and

Paragios 2011

247 image

radiographs

256 × 256,

1mm thin Shape-based

123 manual

traced

OM =

0.9474

26

In summary, each technique has its own pros and cons. Threshold based techniques are very

good when it comes to high contrast CT images but the performance can vary with the low

contrast pathologies. Thresholding can also be affected with different imaging protocols and

image acquisition scanners. Moreover, different lung structures like blood vessels, bronchioles

[39]

data

AD = 1.39

pixel

Annangi et

al. [54]

2010 1130 image

radiographs

128 × 128 and

256 ×

256

Shape-based

deformable

model

1130

manually

traced

images

DSC = 0.88

El-Baz et al.

[53]

2008 10 image

datasets

512 × 512 ×

182,

2.5mm thin

Statistical

MGRF

model

1820 manual

traced

images

Accu. =

0.968

Shi et al.

[52]

2008 247 image

radiographs

256 × 256

Shape-based

deformable

model

247 manual

traced

images

OM = 0.92

AD = 1.78

pixel

Gao et al.

[50] 2007 8 subjects

512 × 512 ×

240 thresholding

8 manual

traced

datasets

DSC =

0.9946

Korfiatis et

al. [48] 2007 23 scans 512 × 512

Wavelet edge

detector

22 manual

traced data

OM =

0.983,

AD = 0.77

mm

Campadelli

et al. [47]

2006 487 image

radiographs

256 × 256 Spatial edge

detector

487 manual

traced

data

Sen. =

0.9174,

Spec. =

0.9584

Sluimer et

al. [34]

2005 26 scans 512 × 512,

0.75–2.0mm

Shape-based

10 manual

traced

data

OM =

0.8165,

AD = 1.48

mm

Yim et al.

[46]

2005 10 subjects

3D

512 × 512,

0.75–2mm

thin

Region

growing,

10 manual

traced data

RmsD = 1.2

pixel

Mendonca

et al. [45]

2004

47 image

radiographs

2D

NA Spatial edge

detector

47 manual

traced data

Sen. =

0.9225

Hu et al.

[49]

2001 24 datasets 512 × 512,

3mm thin

Iterative

threshold

229 manual

traced

images

RmsD =

0.54mm

27

and bronchi have so close densities with chest tissues that it is very difficult to accurately

threshold the region of interest and it requires special post-segmentation processes for accurate

segmentation. Deformable boundary based techniques have the disadvantage of extra

sensitivity of initialization. Further they are unable to overcome the inhomogeneity of lung

volume with the use of traditional external forces like edges and gray levels. Hence it becomes

difficult to achieve the accurate lung segmentation by guiding the deformable model. The

accuracy of shape-based segmentation techniques depends on the accurate registration of prior

shape-model with respect to the CT image. Poor registration in this regard can affect the overall

performance and it is the main limitation of shape based techniques. Further the diversity of

lung pathologies makes it difficult to accurately segment the lung fields [56].

2.2 Lung Nodule Detection

Nodule detection can be defined as the process of detecting suspicious areas in lung region

which may cause the lung cancer. It is performed after lung segmentation which decreases the

workload by removing the background and unwanted areas from input CT image. Various

methods have been presented in the literature for lung nodule candidate detection. Multiple

gray-level thresholding stands out among available methods. Moreover, shape-based, template-

matching-based, morphological approaches with convexity models and filtering-based

methods have been used for this purpose. In the following section, we have reviewed the

selected studies for lung nodule candidate detection.

This section presents different studies which have used different variants of thresholding for

lung nodule candidate detection. Akram et al. [57], Ko and Betke [58] and Zhao et al. [59]

applied multiple gray level thresholding for nodule candidate detection. They argued that a

single value of threshold cannot be used because vessels and different types of nodules may

have different density values so multiple threshold values were calculated for candidate nodule

28

detection. Choi and Choi [12] used multi scale dot enhancement filter for lung nodule candidate

detection. They proposed that since nodules exhibit a circular or dot like objects and they vary

in size therefore single scale to enhance all the nodule cannot be appropriate so multi scale dot

enhancement filter can efficiently detect the candidate nodules. After enhancement, the lung

nodules were detected using thresholding. Gonçalves et al. [60], Chen et al. [61] and Li and

Doi [62] proposed Hessian matrix based approaches for lung nodule detection. Gonçalves et

al. [60] proposed the use of central adaptive medialness principle for lung nodule identification

and segmentation. The proposed method used the shape index and curvedness properties for

identification of lung nodule candidates. The proposed method was evaluated using 569 solid

nodules of LIDC-IDRI dataset showing good results when compared with the ground truth of

manual segmentation of expert radiologists. Choi and Choi [18] proposed entropy based lung

nodule detection system. The proposed system consisted of three stages. In first stage, the input

CT image is divided into informative and non-informative blocks and non-informative block

are filtered out in this step. In next step, the candidate nodules are detected using informative

blocks. The informative blocks are enhanced before candidate nodule detection using 3-D

coherence-enhancing diffusion. After enhancement, the candidate nodules are detected from

enhanced informative image blocks using optimal thresholding. Finally, certain features are

extracted from lung nodule candidates and SVM is used for false positive reduction.

This section groups different studies which have used different variants of template matching

for lung nodule candidate detection in their proposed systems. Hasanabadi et al. [63], Wiemker

et al. [64] and Lee et al. [65] proposed a lung nodule detection system using template matching.

Hasanabadi et al. [63]’s proposed method consisted of three main steps. Initially the lung is

segmented from input CT image using thresholding and morphological operations. In second

step lung nodule candidates are detected using template matching and thresholding. Sixty

nodule patterns were extracted from LIDC dataset and similarity measure was marked between

29

the detected region and these templates and the region which qualified a certain threshold was

marked as nodule candidate. Finally, false positives were reducing using a feed forward neural

network classifier in the third step. The proposed system was evaluated using 07 CT scans from

LIDC dataset. El-Baz et al. [66] also used 2D and 3D deformable templates and a genetic

optimization algorithm to detect the lung nodule candidates.

This section presents different studies which have used different morphological approaches for

lung nodule candidate detection in their proposed systems. Cascio et al. [67] proposed a lung

nodule detection method using 3D Mass Spring Model. The proposed system used Region

Growing and morphological operations for lung volume segmentation. The lung nodule

candidates are detected using a 3D Mass Spring Model. The range of gray values and their

shape information from the model helped to identify accurately the lung nodule candidates.

The system was evaluated using 84 scans of LIDC dataset. Soltaninejad et al. [68] proposed a

lung nodule detection scheme using active contour and KNN classifier. The proposed scheme

consisted of lung volume segmentation using adaptive thresholding and morphological

operations. The lung nodule candidates are detected using 2D stochastic features and extracted

using active contour modeling. Finally, false positives are reduced using KNN classifier.

Jiantao et al. [69] proposed a shape based lung nodule detection method. The proposed system

consisted of three main steps: modeling, break and repair. Initially the regions of interest are

extracted and represented as shape model using the Marching Cubes Algorithm and the

problematic regions are being identified and removed using principal curvature analysis that

can lead to the inaccurate segmentation of the object. Finally, the incomplete regions are being

fitted with the properties of interpolation and extrapolation using radial basis function for

estimating and repairing the suspicious area smoothly. The proposed system was evaluated

using 230 chest CT scans. Kubota et al. [70] proposed a lung nodule detection method using

morphological operations and convexity models. The proposed system consisted of multiple

30

stages. Initially, the lung volume was extracted using the voxel transformation and figure

ground separation which includes the removal of any opacity from lung volume of input CT

image. After this, Euclidian Distance map is used to locate the seed point and then region

growing is applied to identify the candidate nodule region. Finally, the candidate lung nodules

are segmented using convex hull. The proposed system is evaluated using different subsets of

LIDC dataset. Agam and Armato [79], Awai et al. [80], Fetita et al. [81], Tanino [82] and Ezoe

et al. [83] also implied different morphological operations to detect the lung nodule candidates

in their proposed systems. We have summarized different techniques reported in literature for

lung nodule detection in Table 2-2.

Table 2-2 Review of Lung Nodule Detection Methods

CAD Systems

Year Detection Technique

Akram et al. [57], Ko and Betke

[58], Zhao et al. [59] 2016, 2001, 2004 Multiple gray-level thresholding

Choi and Choi [12] 2014 Multi Scale Dot Enhancement

Filter

Gonçalves et al. [60],

Chen et al. [61] and Li and Doi

[62]

2016, 2012, 2004 Hessian Matrix Based Method

Choi and Choi [18] 2013 Entropy Analysis

Hasanabadi et al. [63],

Wiemker et al. [64], Lee et al.

[65]

2014, 2002, 2001 Template Matching

El-Baz et al. [66] 2013 Template Matching and Genetic

Algorithm

Cascio et al. [67] 2012 Stable 3D Mass-Spring Models

Soltaninejad et al. [68] 2012 Active Contour and K-Nearest

Neighbors (K-NN) Classifier

Jiantao et al. [69] 2011 Thresholding and Geometric

Modeling

Kubota et al. [70] 2011 Convexity model and

Morphological Approach

Riccardi et al. [71] 2011 3D Fast Radial Transform

31

In summary, most commonly used lung nodule detection techniques can broadly be classified

into three categories mainly (i) Thresholding (ii) Template Matching (iii) Morphological

Approaches. Every technique has its own pros and cons. Thresholding based techniques have

the major issue of threshold value adjustment. Template matching techniques suffer from the

irregular shapes and diversity of lung nodules and the spherical and cylindrical assumptions

suffer difficulties in detecting the nodules attached to the pleural and vessels with efficiency.

Morphological based approaches suffer from the low detection efficiency of lung wall nodules.

2.3 False Positive Reduction

After nodule candidate detection, we have to classify them into nodules and non-nodules. In

literature, this step is commonly referred as false positive reduction and it comprises of two

steps (i) Feature Extraction (ii) Candidate Nodule Classification into nodules and non-nodules.

Several methods of extracting image features and nodule classification are proposed in

Namin et al. [72] and Murphy

et al. [73] 2010, 2007 Shape Index

Ozekes et al. [74] 2008 3D Template Matching

Ge et al. [75] 2005 Adaptive Weighted K-Means

Clustering

Mendonca et al. [76], Chang et

al. [77], Paik et al. [38],

Takizawa et al. [78],

2005, 2004,2004,

2003

3D Cylindrical and Spherical

Filters

Agam and Armato [79], Awai

et al. [80], Fetita et al. [81],

Tanino [82], Ezoe et al. [83]

2005, 2004, 2003,

2003, 2002 Morphological Operators

Saita et al. [84], Oda et al. [85] 2004, 2002 3D Connected-Component

Labelling

Yamada et al. [86], Gurcan et

al. [87], Kubo et al. [88] 2003, 2002, 2002 Clustering

Mekada et al. [89] 2003 Maximum Distance Inside a

Connected Component

Brown et al. [90] 2001 Patient-Specific Priori Model

Kawata et al. [91] 2001 Linear

Discriminate Functions

32

literature. Most used features are intensity based statistical features, geometric features and

gradient features [30,92]. With the help of extracted feature vectors, nodules are detected

through various supervised and un-supervised classifiers with reduced amount of false

positives [31-33,35,93-95]. There are some methods in which nodules are detected with

pixel/voxel-based machine learning without feature calculation [96-98].

We briefly review the related work in the following, highlighting the challenges which have

motivated our work in this thesis. In 2009, Cuenca et al. [32] proposed a CAD system for

solitary pulmonary nodule detection in CT images using an iris filter. The system was evaluated

using a private dataset, achieving sensitivity of 80% with 7.7 FP/scan. The dataset used in this

paper is private and contains less number of nodules i.e. 77. So, there is very little chance that

the performance of the system will not be affected in various scenarios regarding broad range

of nodule types present in the scans.

Murphy et al. [99] proposed a CAD system using local image features and k-nearest-neighbor

classification. The system was evaluated using a private dataset, achieving sensitivity of 80%

with 4.2 FP/scan. The system detected pleural and non-pleural nodules having size 2-14 mm

using 813 scans. The system uses a large data set for its evaluation but underperforms in terms

of sensitivity.

Guo et al. [100] proposed an adaptive lung nodule detection algorithm. The algorithm

consisted of a feature selection and classification part. In feature selection, eight features were

selected after extraction and SVM was applied as a classifier. The system shows a satisfactory

performance regarding sensitivity but standard datasets have not been used to evaluate the

performance and the used dataset is too small i.e. 29 scans with 2mm slice thickness including

only 34 true nodules.

Liu et al. [101] presented a CAD based pulmonary nodule detection method based on analysis

of enhanced voxel in 3D CT image. The method consists of multiple steps, including lung

33

segmentation, candidate nodules’ enhancement, voxel feature-extraction and classification

with SVM. The system shows good performance by achieving a sensitivity of 93.75% and 4.6

FP/scan but the dataset used consists of 32 cases containing only 33 solitary nodules.

Retico et al. [102] proposed a fully automated system to detect the pleural nodules in low dose

CT-scan images. A feature set consisting of 12 texture and morphological features was

extracted from each nodule candidate. The system achieved a sensitivity of 72% with 6 FP/scan

which shows that the system underperforms in terms of sensitivity.

In 2010, Messay et al. [92] proposed a system for lung nodule detection in CT images. A set

of 245 features were extracted and 40 were selected. The system was evaluated using LIDC

dataset. Achieved sensitivity was 82.66% with 3FP/scan. The system detected nodules of type

juxta-vascular, juxta pleural and solitary having size 3-30 mm. The system showed good

performance overall but underperforms in terms of sensitivity.

Ozekes et al. [103] proposed a computerized lung nodule detection method using 3D feature

extraction and learning based algorithms. The proposed system claimed sensitivity up to 100%

but the system does not give any information regarding the type of nodules in consideration

and a false positive rate of 44 per scan makes the scheme inefficient.

Sousa et al. [104] developed a method for automatic detection of lung nodules in CT images.

They used subset of features to reduce the complexity and increase the speed of the system.

Initially the system extracted 24 features and after selection, there were eight best features

selected. The system obtained a FP/scan of 0.42, FN of 0.15 and 84.84% sensitivity. But the

number of nodules on which the system is tested is too small i.e. 33 (23 benign and 10

malignant). So, there is very little chance that the performance of the system will not be affected

in various scenarios.

In 2011, Niemeijer et al. [94] showed that combination of different CAD systems can increase

the system’s performance as compared to individual system. The results of two different

34

challenges namely ANODE09 and ROC09 (Many state of the art systems participated in these

challenges) were collected and combined where ANODE09 consists of 55 lung CT scans. The

results of different combined studies outperformed the individual studies and concluded that

combination of different techniques can produce promising results.

In 2012, Mabrouk et al. [17] proposed a technique for automatic classification of lung

nodules in CT images using two classifiers. A total of 22 image features were extracted. A

fisher score ranking method was used as a feature selection method to select best ten features.

The system showed good results while dealing with large nodules but failed to detect the

smaller nodules.

In 2013, Assefa et al. [105] proposed a method based on template matching and multi-

resolution for lung nodule detection. Seven statistical and two intensity based features were

extracted for the false positive reduction stage. The system performed at a rate of 81%. Very

high false positive rate (35.15%) makes the scheme inefficient.

Choi et al. [18] proposed a detection method based on hierarchical block classification. The

image was divided into sub blocks and an analysis was made on the basis of entropy and then

sub blocks were selected having high entropy. System attained 95.28% sensitivity and 2.27

FP/scan only. The system shows a good performance overall but the system’ ability to detect

all types of nodules is limited.

Tariq et al. [106] proposed a CAD system for pulmonary nodule detection in CT scan images

using neuro-fuzzy classifier. A detailed feature set containing different properties were

extracted and applied to neuro-fuzzy classifier. They claimed that the method is effective which

can also detect small nodules. But the standard datasets and metrics to evaluate the system

performance have not been discussed. In addition, system does not give any information

regarding types of nodules in consideration.

Orozco et al. [107] proposed a novel approach of lung nodule classification in CT images

35

without lung segmentation. Eight texture features were extracted from the histogram and the

gray level co-occurrence matrix for each CT image. SVM was used for classification of nodule

candidates into nodules and non-nodules after being trained with the extracted features. The

reliability index of 84% was achieved. The system was tested using a private dataset consisting

of only 38 scans with nodules and system’s accuracy is low compared to other techniques.

Tartar et al. [108] proposed a method for classification of pulmonary nodules by using

different features. 2-D and 3-D geometrical and intensity based statistical features were used.

The system achieved 90.7% accuracy, 89.6% sensitivity and 87.5% specificity. The system has

been evaluated using a private dataset consisting of 95 pulmonary nodules.

In 2014, Teramoto et al. [109] proposed a hybrid method for the detection of pulmonary

nodules using positron emission tomography/computed tomography. The proposed method

was evaluated using 100 cases of PET/CT images. The system achieved a sensitivity of 83.0%

with FP/scan of 5.0. The system uses a novel approach of combining CT/PET images but

underperforms in terms of achieved sensitivity.

Choi et al. [12] proposed a computer-aided detection method based on 3-D shape-based feature

descriptor. A 3-D shape-based feature descriptor and a wall elimination method was introduced

to include juxta-pleural nodules. System achieved a sensitivity of 97.5% with 6.76 FP/scan

only. The system was evaluated with LIDC images having 148 nodules. The system shows

good performance overall but underperforms regarding the FP/scan.

In 2015, Ginneken et al. [110] proposed a system which used convolutional neural network

to extract the features to be used in lung nodule detection system. The 2D axial, sagittal and

coronal patches were extracted for nodule candidates and 4096 features were extracted from

the second last layer of the neural network. Linear SVM is used for classification of these

candidate nodules into nodules and non-nodules. The proposed system achieved a sensitivity

of 78%.

36

In 2016, Akram et al. [57] proposed a SVM based classification of lungs nodule using hybrid

features from CT images. The 2D and 3D geometric and intensity based statistical features

were extracted and used to train the classifier. The sensitivity of 95.31% is claimed but the

system does not give any information regarding FP/scan. In addition, the number of nodules

used to validate the results is too small. So, there is very little chance that the performance of

the system will not be affected in various scenarios.

Setio et al. [111] proposed multi view convolutional network based lung nodule detection

system. The proposed system implied three dedicated detectors for large, subsolid and solid

nodules. The final detection is done by the combination of multiple streams of 2D

convolutional networks using a dedicated fusion method. The proposed system is evaluated

using 888 scans of LIDC- IDRI dataset with additional evaluation on ANODE09 and DLCST

datasets. The system achieved a detection sensitivity of 90.1 % with 4 FP/scan only.

Anirudh et al. [112] proposed a lung nodule detection system using 3D convolutional neural

networks. The 3D CNN is used to learn discriminative features for nodule detection. The

proposed system starts by providing a point label of single voxel of nodule and its estimated

size. Unsupervised learning is used to estimate the final 3D label which is used to train the

convolutional neural network. The proposed system is evaluated using 67 scans of SPIE-

LUNGx dataset and achieved a sensitivity of 80 % with 10 FP/scan.

Jacobs et al. [113] compared the performance of the two commercial and one academic state

of the art CAD systems using LIDC-IDRI dataset. 888 scans of the dataset including 777

nodules were used to compare these systems. The study also demonstrated that CAD can also

help to find the missed nodules by the radiologists in the two-phase annotation process.

In 2017, Ding et al. [114] proposed a lung nodule detection system based on deep

convolutional neural networks. The proposed system consisted of two stages. In first stage, a

region based convolutional neural network is applied for nodule detection on image slices and

37

then in next stage, a 3D convolutional neural network is applied for false positive reduction.

The proposed system is evaluated on Lung Nodule Analysis Challenge (LUNA16). The

proposed system achieved high sensitivity of 94.4% with 4 FP/scan.

Setio et al. [115] developed an objective evaluation framework LUNA16 (Lung Nodule

Analysis 2016) to compare the performance of the state of the art CAD systems using largest

available public dataset LIDC-IDRI and inspected the possibility of combining different

methods. The outputs of different CAD systems were combined and showed much better

performance. 888 scans of the dataset including 1186 nodules were used to compare these

systems. The study also demonstrated that CAD can also help to find the missed nodules by

the radiologists in the annotation phase.

Zhu et al. [116] proposed an automatic lung nodule detection and classification system named

‘DeepLung’. The proposed system consisted of two main parts namely nodule detection and

classification. The lung nodule detection system was made of three dimensional faster regional

convolutional neural networks (R-CNN). The detector part of the proposed system was

evaluated using 10-fold cross validation scheme and LUNA16 dataset while the classification

part was validated using LIDC-IDRI dataset. The system achieved a detection sensitivity of 83.

4%. The review of these CAD systems is summarized in Table 2-3.

38

Table 2-3: Review of Current CAD Systems, * N/A means Not available.

CAD

Systems

Data

Set

No.

Cases

No.

Nodules

Extracted

Features

Sensitivity

(%)

FPR

Remarks

Cuenca et

al. [32]

Private 22

77 Intensity,

Morphol

ogical

80.00 7.70

Used dataset is

too small

containing less

number of

nodules.

Guo et al.

[100]

Private 29 34 Shape 94.77 N/A

Sousa et al.

[104]

Private N/A 33 Shape,

Texture,

Gradient,

Histogra

m,

Spatial

84.84 0.42

Liu et al.

[101]

Private 32 33 N/A 93.75 4.60

Orozco et

al. [107]

LIDC,

ELCAP

128 75 Texture 84.00 7.00

Tartar et al.

[108]

Private 63 95 Shape 89.60 7.90

Messay et

al. [92]

LIDC 84 143 Shape,

Intensity,

Gradient.

82.66 3.00 Systems

underperform

in terms of

sensitivity/

accuracy.

Murphy et

al. [99]

Private

813

1518

Shape

Index,

Curvedne

ss

80.00

4.20

Retico et al.

[102]

Private 42 102 Morphol

ogical,

Texture

72.00 6.00

Teramoto

et al. [109]

Private 100 103 Shape,

Intensity

83.00 5.00

Ozekes et

al. [103]

LIDC 11 11 Shape 100.00 44.0

0

High false

positive rate

makes the

schemes

inefficient.

Assefa et al.

[105]

ELCAP 50 165 Intensity,

Statistical

81.00 35.1

5

Choi et al.

[12]

LIDC 84 148 Shape

Based 3D

Descripto

r

97.50 6.76

Mabrouk et

al. [17]

Private 12 N/A Shape,

Intensity

97.00 2.00 System failed

to detect

39

In Table 2-3, we have summarized different studies into specific groups. The first section

presents a group of studies which have used small datasets containing small number of nodules.

It is presumable that the performance of these systems will be worsened in various more

realistic scenarios with broader range of nodule types present in clinical scans. Second section

presents a group of studies which underperforms in terms of accuracy/sensitivity by having

relatively lower accuracy/sensitivity as compared to other systems. Third section presents the

studies in which high false positive rate becomes a major issue. Last section presents other

studies highlighting some additional challenges.

smaller

nodules.

Choi et al.

[18]

LIDC 58 151 Shape,

Intensity

95.28 2.27 System’s

ability to

detect all type

of nodules is

limited.

Akram et al.

[57]

LIDC 47 50 Shape,

Intensity

95.31 N/A System is

evaluated with

small number

of nodules and

FP/scan is not

informed.

Ginneken

et al. [110]

LIDC 865 1147 Convolut

ional

Neural

Network

78.00 4.00

CNN may

have a high

computational

cost and

requires a

large dataset

for training,

which is not

mentioned in

the last two

studies.

Setio et al.

[111]


ional

Neural

Network

90.1 4.00

Anirudh et

al. [112]

SPIE-

AAPM

LUNG

x

67 N/A Convolut

ional

Neural

Network

80.00 10.0

0

Ding et al.

[114]


ional

Neural

Network

94.40 4.00

40

2.4 Problem Statement

In summary, the review of the current schemes shows their lack of ability to detect all nodules

while maintaining the same precision in terms of sensitivity and reduced number of false

positives per scan. Most of the algorithms are optimized and limited to a particular set of data

which limits the generalization of the results. In addition, the current schemes have not been

evaluated on sufficiently large datasets to achieve more robustness. Therefore, methods

evaluated having lesser number of nodules are not guaranteed to present the same performance

in all circumstances. Moreover, since feature extraction is very important for the

characterization of the nodules from other anatomic structures present in the lung region, the

choice of optimum feature set for nodule detection via conventional feature-based approaches

or convolutional neural networks is still an unresolved issue. Thus, the real challenge is to make

more accurate systems in terms of sensitivity and reduced FP/scan with increased nodule

diversity.

In this thesis, we present a novel technique for pulmonary lung nodule detection using a hybrid

feature set and SVM classifier. The proposed feature set has been achieved after rigorous

experimentation, which has helped in reducing the false positives significantly. Prior to nodule

detection, an image enhancement technique has been used to increase the detection rate of low

density nodules, which has helped to increase the sensitivity of the proposed system. A fully

automated lung segmentation technique has been applied using optimal thresholding and

connected component labeling. To the best of our knowledge no similar technique has been

reported with the combination of steps that we have used. In addition to SVM, different

classifiers have been used to evaluate the performance of the proposed system. Finally, an

attempt has been made to determine the most relevant feature class in extracted feature set. The

overall sensitivity has been improved compared to the previous methods and FP/scan have been

reduced significantly.

41

Chapter 3: PROPOSED SCHEME FOR LUNG NODULE

DETECTION

The proposed methodology consists of series of steps which start with pre-processing followed

by lung segmentation, image enhancement, nodule detection, feature extraction and

classification of lung nodules. The block diagram of the proposed method is shown in Figure

3-1. After preprocessing, the lung image is thresholded using optimal thresholding, then the

background removal and hole filling operations are done on the image prior to lung

segmentation from thresholded image. Contour correction is made to include juxta-pleural

nodules using morphological operations. Before ROI extraction, i.e. identifying the candidate

nodules, it is very important to make sure that all candidate nodules have been included. To

this end, the contour corrected image is enhanced. The candidate nodules are detected and

segmented simultaneously. Next, the features are extracted from lung nodule candidates and

used for classification using SVM classifier. In the following section, each step of our proposed

method has been described in detail.

Figure 3-1: Flow Chart of the Proposed Method

Contour Correction of Lung Lobes

Hole Filling

Background Removal

Optimal Thresholding

SVM

Feature Extraction

Candidate Nodule Detection

Image Enhancement

Non-Nodule

Nodule

Input CT Image

Pre-Processing

Lung Parenchyma Segmentation

42

3.1 Lung Segmentation

Lung segmentation has a critical importance as it is pre-requisite to the nodule detection. Any

in-accurate lung volume segmentation can lead to the low accuracy of whole system. In this

thesis, we propose a fully automated segmentation method for lung volume from CT scan

images.

3.1.1 Lung Image Preprocessing

Initially, the input CT images needs preprocessing which include pixel gray scale conversion

and denoising. Since, each pixel of the DICOM image occupies 2 bytes, or 16 bits, where

significant bits are 12, so its gray scale range is between 0 and 4095 and needs to be converted

to 0 to 255. The reason for the conversion is that the range of 4096 is too large and conversion

can improve the processing speed and save the pixel space. After gray scale conversion, any

present noise is removed from the image. In the CT scanning imaging process, introduction of

noise is inevitable and can ultimately lead to the false segmentation and classification if not

removed at this stage properly which can result in missed or false detection of nodules. There

are certain image preprocessing techniques which can eliminate the noise and reduce the error.

Most commonly used techniques are median filtering, mean filtering, Gaussian filtering,

wiener filtering and wavelet transform [13].

In this dissertation, median filtering [117] is used to remove any present noise in initial CT

images. The median filter can effectively remove the salt and pepper noise and speckle noise

while preserving the image details like edge information and image features and has a simple

and faster operating principle.

43

3.1.2 Lung Parenchyma Segmentation

3.1.2.1 Overview of Image Segmentation Methods

After image preprocessing, the first step is to segment the lung region. Lung segmentation is

the process of extracting the lung volume form input CT image and removing the background

and other irrelevant components. Since pulmonary nodules lie mainly within the lung volume,

the segmentation of the lung region must be accurate, complete and should not contain any

irrelevant information. Accurate lung segmentation reduces the workload of doctors and is one

of the most important steps of lung CAD.

Common methods of image segmentation are region-based segmentation, boundary-based

segmentation method and segmentation method based on specific theory. Region-based

segmentation methods are mainly thresholding, region-growing, region splitting and clustering

method [56]. Thresholding is one of the most commonly used segmentation method among

these. It is used to analyze the histogram of the lung images which show bimodal form and is

quite effective in the initial segmentation. There are different types of thresholding including

global thresholding, adaptive thresholding, optimal thresholding, multiple thresholding and

variable thresholding. Region growing is a method of obtaining the larger regions from the

initial seed point by grouping the same pixels with respect to a predefined criterion. The key to

the region-growing method is the selection of the initial seed point and the growth rule. This

method has better segmentation effect, but the seed point selection involves human intervention

and cannot meet the requirement of automatic processing. The boundary-based segmentation

method mainly detects and links the edge pixels to make the boundary contour per gradient of

the gray scale and then performs the segmentation to obtain the final required image. Gradient

of image can be calculated using Sobel, Canny, Roberts or Gaussian operator. This method is

very sensitive to the edge detection but has some problems including production of spurious,

44

missing or discontinuous edges leading to inaccurate segmentation. Further, it is quite sensitive

to noise and does not work well in low contrast images.

3.1.2.2 Proposed Method of Lung Volume Segmentation

The segmentation of lung volume is the basis of follow-up nodule detection. Since the doctor

focuses on observing the area only within the lung volume, it is necessary to eliminate

irrelevant information outside the lung volume and reduce the observed information for the

doctor, which is important in the pulmonary nodule test. In the original CT images, density of

the lung volume is different from the background which provides the basis of intensity based

segmentation techniques to be used effectively. We have not used the region-growing

algorithm, mainly because of the manual selection of seed point and relatively slower speed.

In this research, we have used optimal thresholding followed by a connected component

labeling and contour correction [49,118]. The proposed work flow for lung volume

segmentation is shown in Figure 3-2.

Figure 3-2: Lung Parenchymal Segmentation Flow Chart

Lung segmentation consists of series of steps. Initially, the CT image is segmented by using

optimal thresholding and the lung volume is obtained using connected component labeling

method and other irrelevant information is removed at this stage. The resultant image at this

45

stage contains holes which is filled with the hole filling algorithm e.g. morphological

operations. Finally, the lung contour is smoothed by rolling ball algorithm to include any juxta

pleural nodules. In the following section, each of this step is described in detail.

3.1.2.3 Thresholding

For optimal thresholding, let 𝑇𝑖 be the threshold after the 𝑖𝑡ℎ step. The lung CT scan can be

divided in two density groups. The gray scale values of lung CT scan normally varies from 26

to 250 (-1000 HU to +1000 HU). The lung area also called non-body area is a low-density area

and its gray scale value ranges from 50 to 150 (-910 HU to -500 HU) [49]. The CT scanner

area is also part of the non-body area. The body area contains the surroundings of lung region.

Because the lungs are in non-body area, we initially select a threshold value of 150 (-500 HU)

for 𝑇𝑜. For selection of new threshold, we apply 𝑇𝑖 to the lung image. Let 𝜇𝑜 and 𝜇𝑏 be

the mean intensities of the object and background in the lung region respectively, the new

threshold is given by [49]:

𝑇𝑖+1 = 𝜇𝑜 + 𝜇𝑏

2 (3.1)

Where 𝜇𝑜 and 𝜇𝑏 can be calculated as:

00

0

=

T

i

i

T

i

i

i p

p

(3.2)

(3.3)

1

1

=

L

i

i Tb L

i

i T

i p

p

46

Where ip is the probability of 𝑖 gray value. In this manner, this iterative approach carries on

until our threshold converges to a point i.e. the difference between 𝑇𝑖+1 𝑎𝑛𝑑 𝑇𝑖 is less than a

predefined value. At this point the iteration stops and an optimal threshold 𝑇𝑜𝑝 is obtained. As

such, an initial segmented lung image volume 𝑓(𝑥, 𝑦, 𝑧) can be obtained as follows:

𝑓(𝑥, 𝑦, 𝑧) = {1 𝑓(𝑥, 𝑦, 𝑧) ≥ 𝑇𝑜𝑝

0 𝑓(𝑥, 𝑦, 𝑧) < 𝑇𝑜𝑝 (3.4)

In which x and y indices represent the slice coordinates and z indicates the slice number. The

volume consists of the total number of z slices and each slice has dimensions of x × y pixels.

Results of optimal thresholding on a few sample images can be seen in column (b) of Figure

3-3.

After applying optimal thresholding, we get a lung CT image which contains body and non-

body area. White area belongs to non-body area and black belongs to body area. We are

interested in extracting the lung region from non-body voxels. To achieve this, we apply 3D

connected component labeling to initially thresholded image 𝑓(𝑥, 𝑦, 𝑧) to acquire the lung

region from non-body voxels. We have used 18-connected neighborhood to obtain the 3-D

connected components. This provides a tight connectivity making every voxel neighbor to

other which touches its face or edge. After labelling, we select the lung regions based on the

size of these volumes. The air in the vicinity of body is easily removed because it is connected

to the border of volume. Using this technique, the first and second largest volumes are selected.

Most of the unwanted components (air outside the body and gas in the intestine) are ignored in

the volume selection and hence removed. The resultant image at this stage contains holes in

lung region, which may be potential nodules or vessels. These must be included to the lung

region for accurate detection and thus filled by morphological operations. The resultant image

at this stage can be seen in column (c) of Figure 3-3. The hole-filled image may contain the

47

potential nodules at the border known as juxta-pleural nodules. These nodules must be included

for accurate detection. To include these, we use a rolling ball algorithm [118].

3.1.2.4 Boundary Repair of Lung Parenchyma

Various methods have been proposed for repairing gaps in the lung boundaries. The rolling

ball method is one of the popular repair method. In this method, a two-dimensional ball filter

is placed tangentially on the boundary of the lungs and rolled along the direction of the lung

boundary. If there is a gap in the lung boundary, it is identified by the contact of rolling ball

filter at more than two points. This gap is filled by the new contour segment that linearly

connects the two end points of gap. The basic principle of this method is to use the change in

boundary curvature and circular filter template for lung contour morphological operation to

achieve the goal of smoothing.

One important aspect which needs to be considered is the selection of size of the rolling ball

filter. If the selected radius is too small, smoothing of the lung contour will have no effect and

will affect the desired segmentation of the lung contour which can lead to lower detection

accuracy of the system.

On the contrary, if the selected radius is too large, then there is a possibility of pleural part

inclusion into the real lung contour, adding the interference to the area of lung nodules and

affecting the results. So, it is very important to set the appropriate circular radius after large

number of validations. Final process images after lung contour corrections are shown in column

(e) of Figure 3-3.

48

(a) (b) (c) (d) (e)

Figure 3-3: Example images of lung volume segmentation, (a) to (e) from left to right

presenting input, thresholded, hole filled, lung segmented and contour corrected images,

respectively.

49

3.1.2.5 Separation of Left and Right Lung Parenchyma

One other aspect that needs to be explained is the separation of right and left lung region. After

the extraction of the lung volume, in many cases the gray level thresholding fails to separate

the right and left lungs completely and there may exist a junction between them which needs

to be removed for further processing including smoothing of lung contours. The pseudo-code

for lung separation is as follows:

1. Binarize the input connected lung region;

2. Set horizontal axis x=256 for image size 512 × 512;

3. For x, set the horizontal scan area for the left and right

floating 20 points each side, i.e. 236 to 276.

4. Start column scan after fixing horizontal coordinate value

to 0.

5. Record the first value of the maximum gray value coordinates

and continue to scan and then record the first value of 0

coordinates.

6. Store the difference ΔL.

7. Repeat steps (4) to (6) for set abscissa area.

8. Compare with the abscissa value 256 left 20 data changes and

then compare the abscissa value 256 on the right side of the 20

data changes, one side of the trend for the first change becomes

smaller.

Record its value compared with the right, if still it is the

smallest value, it is considered as the separation position

otherwise the original axis coordinate 256 wants the array value

to be smaller on the side of the transformation.

9. Repeat steps(4)to(8) until the separation position is found.

After separation of left and right lungs, gray level values were converted to 0 according to the

location. This process is shown in Figure 3-4 where (b) of Figure 3-4 represents the output

image of this algorithm.

50

(a) (b) (c)

Figure 3-4: (a) Represents a parenchymal image (b) represents an image after repair of the

lung parenchyma (c) zoomed view of left and right lung contour separation.

3.2 Image Enhancement and Nodule Detection

Image enhancement is very critical for the sensitivity of the lung nodule detection system as it

plays an important role in detection of the nodules by enhancing them and reducing false

positives by weakening the other structures in lung region [119]. It is also necessary because

there are some low-density nodules, which may remain undetected. Hence, it is imperative for

us to account for every potential nodule candidate. Because of the CT scan and the complexity

of the lung parenchyma, lung nodules, blood vessels and bronchial exhibit almost same gray

scale values. The similarity of these organizational structure makes it almost impossible to

accurately extract the pulmonary nodules. To accurately extract suspected pulmonary nodules,

it becomes necessary to change the gray scale values for these regions of interest and increase

the contrast between the two to screen out suspected pulmonary nodules. Here, we are

interested in accurate extraction of the pulmonary nodule which is our region of interest,

therefore, prior to the detection of pulmonary nodules, it is necessary to perform image

enhancement on the region of interest which includes low density nodules aka ground-glass

nodules. The enhancement of the region of interest is to highlight the suspected pulmonary

nodules in the image and to reduce the gray scale value of the non-nodular region so that the

51

contrast between the two can be improved and thus it is more likely that the suspected

pulmonary nodules can finally be extracted.

Image enhancement is based on specific circumstances that will highlight some of the features

and some features will be weakened to achieve good image effect. So, there are two main

objectives of the enhancement: First, improve the visual perception of the image to improve

the image itself. Second, the analysis and decomposition of the region of interest in the image

helps to study the information in this respect.

In summary, image enhancement is one of the key steps in the detection of pulmonary nodules.

The enhancement of the ROI is mainly divided into three steps: the first step is to obtain the

lung parenchyma region, this step has been completed in the previous section, the second step

is the enhancement of the ROI and the third step is classification of candidate nodules into

nodules and non-nodules based on their characteristics for initial screening.

3.2.1 Theoretical Research on Image Enhancement Algorithm

There are many methods of image enhancement, such as contrast stretching, Gamma

correction, histogram equalization, frequency domain and spatial domain sharpening and other

enhancement methods [120]. The enhancement techniques can be divided into two categories

(i) spatial domain (ii) frequency domain image enhancement techniques. Regardless of the

spatial or frequency domain image enhancement, the aim is to highlight the image information

and suppress the interference information to complete the conversion of the image itself for the

next stage to provide a good data information base and then better extract the required image

section.

52

3.2.2 Multi-Scale Enhancement Algorithm Based on Hessian Matrix

Due to the nodules’ characteristics, like their shape is similar to a circular or dot like object so

the grayscale value of such a shape can be enhanced. The spatial enhancement algorithm will

enhance the vascular tissue, bronchial tissue and nodules in the image but the focus of

enhancement will be the edge information of these structures and the contrast of gray-scale

enhancement would be relatively limited. To better extract nodules, it is necessary to enhance

the intensity of the circular region of the image and suppress the grayscale values of the blood

vessel so as to enhance the overall effect of the area of interest. The area of interest mentioned

here is the suspected pulmonary nodules. But the pulmonary nodules and other lung tissues

such as the cross-section of the blood vessels are characterized by circular dense images and

their densities are quite similar which is one of the most important difficulties in the extraction

and detection of pulmonary nodules.

In this dissertation, we propose a multi-scale dot enhancing filter [121] based on Hessian matrix

for image enhancement. In the first step, a Gaussian smoothing [122] on all the corresponding

2D slices is performed to reduce the noise and sensitivity effect. A 2D smoothing is applied

because it produces promising results and reduces computational complexity. After Gaussian

smoothing, Hessian matrix and its eigen values |𝜆2| < |𝜆1| are calculated for every pixel to

determine the local shape of the structure [119]. The suspected pulmonary nodule region

exhibits the form of a circular or oval object whereas vascular tissue structures presents a line-

like elongated structure. Therefore, this property can be used to distinguish different shape

structures present in lung region [123]. Figure 3-5 shows the specific flow chart of the

enhancement algorithm. In the following section, each step of enhancement algorithm is

described in detail.

53

Figure 3-5 Multi-Scale Circular Filter Enhancement Algorithm Flow Chart

3.2.2.1 Initial Enhancement Based on Gaussian Function

Gaussian filter is a linear smoothing filter in frequency domain which is widely used in image

processing field. The most important parameter of Gaussian filter is its scaling parameter

which controls the smoothing effect. When the value of is smaller, accuracy of signal edge

positioning is higher but the smoothing effect is smaller and we have the worse noise

suppression ability. Conversely, with the increasing value of , smoothness of the signal is

Start

Set the initial scale

Obtain eigenvalues from the Hessian

matrix

Compare the enhanced image with previous

one

Scale values reach threshold?

Get enhanced images

Yes

No

Modify the scale

End

Gaussian smoothing according to scale

54

greater and noise removal effect is better but the signal edge will be blurred and edge traverse

phenomenon is another serious issue. Therefore, the value of in Gaussian filter is an

important issue. In lung CT images, the pulmonary nodules are circular or dot-like objects. The

Gaussian function of a dot (target pulmonary nodule) can be approximated as (3.5) [121]:

2 2

2( , ) exp( )

2

x yd x y

(3.5)

Based on the theory of normal distribution in probability theory, while the radius of lung

nodules was 0 ,the pulmonary nodules accounted for 49.91% of the Gaussian distribution

function area. When the radius of the pulmonary nodules was 2 0 , the pulmonary nodules

accounted for 68.26% of the Gaussian distribution function area and with the lung nodule

radius 4 0 , the pulmonary nodules accounted for 95% of the Gaussian distribution function

area [121]. In other words, to cover the specified size nodule detection with a diameter d, it is

necessary to set the appropriate scale using the scale parameter d/4. Approximation of the

Gaussian function with this scale parameter is most appropriate which can enhance the

specified radius of the target pulmonary nodule. The two-dimensional Gaussian function is

calculated as [119]:

2 2

2

1( , , ) exp( )

22f

ff

x yG x y

(3.6)

The convolution response of the target nodule and the Gaussian function is given by:

( , , ) ( , , )* ( , )f fR x y G x y d x y (3.7)

At the object center (0,0) , to obtain the strongest Gaussian response (0,0, )fR , the two

scales must be same. So, with 0 (the set value of object scale), when expression becomes

55

(0,0, )0

f

f

R

, at the same time, R has the maximum value and

0f is the optimal scale

for the lung nodule enhancement.

3.2.2.2 Hessian Matrix Construction

The Hessian matrix method is to extract the image characteristic direction by higher order

differential processing and the eigenvalues can be used to judge the different types of points

and structures on the image such as the nodule in the lung image and the edge of the blood

vessels. In this dissertation, since 2D smoothing is applied on all the corresponding 2D slices

so the Hessian matrix of the measured point is set to the real symmetric matrix of the second

order and the expression is as [121]:

=xx xy

yx yy

f fH

f f (3.8)

Obtaining the corresponding characteristic function formula such as:

( )( ) 0xx yy xy yxf f f f (3.9)

Thus, we obtain two eigenvalues from the Hessian matrix 1 , 2 respectively:

1 ( + ) / 2 K Q (3.10)

2 ( ) / 2 K Q (3.11)

Where K and Q are as follows:

xx yyK f f (3.12)

2( ) 4xx yy yx xyQ f f f f (3.13)

The second order Hessian matrix template has been constructed for the next stage. The

eigenvalue parameter size of the Hessian matrix is the basic reference parameter of the circular

enhancement filter.

56

3.2.2.3 Construction of Circular Enhancement Filter

As described earlier, the morphology of the suspected pulmonary nodules in the lung region is

circular or oval and the structure of vascular tissue is line-like elongated structure. Therefore,

it is possible to differentiate the characteristics of different shapes by using circular enhanced

filter to enhance the information of suspected pulmonary nodules and inhibit the vascular

tissue. In this thesis, the circular enhancement filter in the two-dimensional space is used. The

circular structure and the line structure can be expressed as:

2 2

22( , )

x y

d x y e (3.14)

2

22( , )

x

l x y e

(3.15)

For two-dimensional image ( , )f x y , 1 and 2 are two eigenvalues of a Hessian matrix with

the condition | 1| | 2| . For the circular structure and the line structure, the two eigenvalues

need to correspond to the corresponding prerequisites. For circular structures [119]:

𝜆1 = 𝜆2 << 0 (3.16)

And for Line structures:

𝜆1 << 0, 𝜆2 = 0 (3.17)

Here we have assumed that we are trying to enhance bright objects from their dark background.

The filter response can be calculated as:

𝐸𝑐𝑖𝑟𝑐𝑙𝑒 = {|𝜆2|2/ |𝜆1|, 𝜆1 < 0, 𝜆2 < 0

0, otherwise

(3.18)

Because we have different pulmonary nodule diameters, a single scale for enhancement was

not good enough. Therefore, we used multi-scale enhancement filtering to optimize the

57

extraction. By assuming that the nodules to be detected have diameters in the range [𝑑𝑜 , 𝑑1 ] ,

the 𝑁 discrete smoothing scales in the range [𝑑𝑜

4⁄ ,𝑑1

4⁄ ] can be computed as [121]:

𝜎1 =𝑑0

4, 𝜎2 = 𝑟

𝑑0

4, 𝜎3 = 𝑟2

𝑑0

4, … … … … . 𝜎𝑁 = 𝑟𝑁−1

𝑑0

4 =

𝑑1

4 (3.19)

Where 𝑟 = (𝑑1

𝑑𝑜)

(1(𝑁−1)⁄ )

and each scale has the corresponding nodule diameter 4𝜎. The

algorithm works as follows: First, we determine the specified 𝜎 scale of the image by using

Equation (3.19) and smooth the image using Gaussian function. Initially, smallest value of

scale is selected which is incrementally extended. Then the two eigen values of Hessian matrix

, 𝜆1 and 𝜆2 are calculated which are followed by the calculation of respective value of 𝐸𝑐𝑖𝑟𝑐𝑙𝑒

filter. This process is repeated for different scales and finally we integrate the filter’s output

values to obtain the maximum value for the best enhanced effect and generate the resultant

image as:

𝐼𝐷(𝑥, 𝑦) = { 1, if: 𝐸𝑐𝑖𝑟𝑐𝑙𝑒,𝑚𝑎𝑥

0, otherwise

(3.20)

where 𝐸𝑐𝑖𝑟𝑐𝑙𝑒,𝑚𝑎𝑥 = max 𝐸𝑐𝑖𝑟𝑐𝑙𝑒 , 𝜎 ∈ [𝜎𝑚𝑖𝑛, 𝜎𝑚𝑎𝑥]. Figure 3-6 shows the results of image

enhancement at different slices.

58

(a) (b) (c) (d)

Figure 3-6: Example images showing results of image enhancement at different slices. (a)

and (b) shows a low-density nodule in red circle, which is detected after image enhancement

where (c) and (d) shows the other two slices after image enhancement.

3.3 Lung Nodule Detection and Classification

After number of previous steps, the lung parenchyma area carrying lung nodules is extracted

and now the most important and last step in computer aided detection of pulmonary nodules,

which also serves as the output of CAD system, is detection and classification of suspected

pulmonary nodules. This step includes not only image processing technology but also data

mining technology.

Detection of pulmonary nodules can be done using different techniques including image

segmentation, image matching and image enhancement technology. The organizational

structure of the lung parenchyma region contains vascular tissue, bronchial tissue and

pulmonary nodules. So, unlike the previous detection methods, there is a need of more precise

algorithm because it can directly affect the results of our classification system. Most intolerable

thing for a radiologist is a missed nodule which can be potential cancerous nodule. For

radiologist, the lung disease is not just lung cancer. Tuberculosis and pneumonia also belongs

to other types of lung disease. Therefore computer-aided detection system can rather detect

erroneously but should not miss any potential candidate nodule is the primary requirement of

the system.

59

Lung nodule detection based on image segmentation technology is mainly based on gray-scale

threshold method but threshold value should be properly adjusted to improve the detection of

suspected pulmonary nodules. In addition to the threshold segmentation, there are some other

techniques based on image segmentation which are described in literature review.

Next, the image matching technique can be used to construct the pulmonary nodule template

according to the nodular morphological features and gray scale variation characteristics and

matched with the gray area in the lung parenchyma of the patient. As there are different types

of pulmonary nodules and the number of lung cancer patients are growing every day, so the

diversity of lung nodules is a challenging task for the creation of a lung nodule template.

Researchers have been trying to build a template library of lung nodules to increase the

detection accuracy of early lung cancer but it is still an unresolved issue.

Finally, the image enhancement techniques in spatial and frequency domain can also be used

to detect the suspected pulmonary nodules. The fact that makes pulmonary nodule detection a

difficult task is that we want to exclude as many non-lung nodule areas as possible without

missing any potential nodule candidate. In other words, algorithm rules should be flexible to

perform this task with precision and accuracy.

The classification of pulmonary nodules after pulmonary nodule detection is also known as

false positive reduction and it consists of two steps. In first step, features are extracted from the

suspected pulmonary nodules. The features of pulmonary nodules include intensity (grayscale)

features, shape features, texture features and some other features [30, 92]. Selection of nodule

features is very important as there are many features reported in literature and the detection

results can have varying degrees of influence with respect to feature set where increase in the

accuracy of lung nodule detection system can also extend the processing time of the system.

60

The effect of some features is different and the correlation degree of features is also different.

Therefore, it is necessary to calculate a large set of features and then select the optimal feature

set by feature selection and data mining techniques and finally use one or more classifiers for

lung nodule classification. The strength of the detection capability depends on the optimization

of the feature set. The key lies in the selection of the number of features, the impact of the

features and their correlation [124].

3.3.1 Rule-Based Analysis of Lung Nodule Candidates

In this dissertation, lung nodule candidates are detected by applying optimal thresholding (same

algorithm is used for nodule candidate detection which was used for lung thresholding and is

explained in previous section) on dot enhanced images. Then a rule-based analysis has been

made based on some initial measurements like area, diameter and volume whether to keep or

discard the detected nodule candidate [92]. The advantage of rule-based analysis is that it

eliminates the objects which are too small or too big to be considered as a nodule candidate

and thus reduces the workload for the next stage. All segmented objects must meet the

following basic size requirements to be considered as a good nodule candidate. The computed

area may lie in the range 4-908 mm2, equivalent to a diameter 2.5-34 mm and the volume must

not exceed the range 8-20580 mm3. After rule-based analysis, several features are extracted

from good nodule candidates and used to train the SVM classifier in the next step. Examples

of some detected nodule candidates can be seen in Figure 3-7.

(a)

61

(b)

Figure 3-7: Examples of detected candidates (a) nodules (b) non-nodules. It can be seen that

nodule diversity and their close resemblance to other anatomic structures present in the lung

region make the task of detection more challenging and produces false positives, which are

being reduced with the aid of a classifier.

The goal of this step is to reduce the FP/scan. It comprises of two steps: feature extraction, and

classification. We briefly provide details on each of these steps in the following.

3.3.2 Feature Extraction

Feature extraction can be used to reduce the original dataset to certain characteristics, which

can differentiate one input from others. Nodules have their own characteristics, which

differentiate them from other anatomical structures present in lung region [18].

Pulmonary nodule features can be broadly classified into three categories namely, gray

(intensity) features, shape (morphological) features and texture features. These features exhibit

complete nodal characteristics and can be converted into rule based analysis. A wide variety of

image features can play an important role in the subsequent classification of suspected

pulmonary nodules. Different image features play different role. Some features may play a vital

role and there may be some features that may have no effect. This section first lists the various

features, combined with some image characteristics of pulmonary nodules.

62

3.3.2.1 Shape Features

The shape (2D & 3D) features of suspected pulmonary nodules are characteristic parameters

that represent the shape characteristics of nodules. They need to embody the characteristics of

nodule itself and the difference between the nodule and the vascular section and the branches

of the vessel. At the same time, the characteristics of nodule itself can vary in different image

sequences. The shape features are closely related to the lesions of the actual pulmonary nodules

and they can visually represent the nodules which is another kind of diagnostic reference

information for doctors in the process of diagnosis.

The morphology of the suspected pulmonary nodules is mainly from the set of the outline pixels

of the suspected pulmonary nodules and the range and changes of the contours can show the

lung pathology of the patients. In this dissertation, the morphological characteristics of

pulmonary nodules are expressed by selecting shape features which can reflect the specificity

of pulmonary nodule shape and have the properties of rotational invariance. This thesis

enumerates several shape features (2D and 3D) including area (where suspected lung nodule

area indicates the number of voxels in the median slice of lung nodule ), volume (where volume

represents the total number of voxels in the segmented pulmonary nodule), perimeter (where

suspected pulmonary nodule perimeter represents the number of voxels on the outline of a lung

nodule), image moments and central moments (to compute the shape information such as the

centroid and information about the orientation, it involves the moment feature in geometry

which is called geometric invariant because it has the properties of rotation, scale and

translational invariance. The gray-scale characteristics of the region are described by each order

moment of the gray distribution within the ROI shape region. Different classes represent

different meanings, the zero-order moment represents the quality of the region, the first-order

moment represents the coordinates of the regional centroid, the second-order moment

represents the orientation information of the region and so on), Centroid (center of the

63

suspected pulmonary nodule), major and minor axis length and elongation (Suspected lung

nodule length axis indicates the equivalent ellipse of the lung nodule area which corresponds

to the length axis and its proportion. If the ratio of the length axis is closer to one, it indicates

higher roundness and greater possibility that the part is the lung nodule and vice versa),

circularity (the macroscopic roundness of suspected pulmonary nodules represents the degree

of deviation between the region of the suspected pulmonary nodule and the circular region i.e.

the ratio of the radius of the inner circle to the radius of the circle. Where the radius of the inner

circle is equal to twice the area of the suspected pulmonary nodule divided by its circumference

and the circumscribed circle radius is equal to half of its equivalent elliptical long axis. If

circularity is closer to one, the more likely it is to be a pulmonary nodule and vice versa),

compactness (suspected pulmonary nodule compactness indicates that the volume of the

suspected pulmonary nodule is close to the roundness. It also indicates the smoothness of the

contour. If the compactness is closer to one, the boundary is smoother and if its value is smaller,

the edges are more complex and rough) [17,19,30,92,108]. The expressions of these shape

features are presented in 1st column of Table 3-1.

3.3.2.2 Intensity Features

Gray-scale (intensity) features are mainly the distribution and frequency of gray level in the

image matrix i.e. the value of each gray level and the number of occurrences on the lung

parenchyma image. It is the basic principle on which software development engineers can work

on to extract some important intensity features of suspected pulmonary nodules. Lung CT mage

itself is a large gray scale image in which the difference of gray values can reflect the

64

Shape Features Intensity Features Texture Features

Area [92]

𝐴 = ∑ 𝑜

𝑜 ∈ 𝑂𝑚

Elongation

[18]

𝐸 =𝑎

𝑏

Mean [57] �� =∑ 𝑥𝑖

𝑛𝑖=1

𝑛

Normalized

GLCM [125]

𝑃∧

𝛿(𝑖, 𝑗) =𝑃𝛿(𝑖, 𝑗)

∑ ∑ 𝑃𝛿(𝑖, 𝑗)𝐿−1𝑗

𝐿−1𝑖

Image

Moments

[17]

𝑚𝑝𝑞 = ∑ ∑ 𝑥𝑝𝑦𝑞𝑓(𝑥, 𝑦)

𝑦𝑥

Perimeter

[108]

𝐿(𝐼)

= ∑ 𝐼(𝑥, 𝑦)

(𝑥,𝑦)∈𝐶

Variance [57]

𝑆2

=∑ (𝑥𝑖 − ��2)2𝑛

𝑖=1

𝑛 − 1

Energy [126]

𝑒𝑛𝑒 = ∑ ∑ 𝑃𝛿2

∧

(𝑖, 𝑗)

𝐿−1

𝑗=0

𝐿−1

𝑖=0

Central

Moments

[17]

𝜇𝑝𝑞 = ∑ ∑(𝑥 − 𝑥0)𝑝(𝑦

𝑦𝑥

− 𝑦0)𝑞𝑓(𝑥, 𝑦)

Circularity

[108]

𝐶 =4𝜋𝐴

𝐿2

Maximum

Value Inside

[92]

𝐼𝑚𝑎𝑥 = 𝑚𝑎𝑥(𝐼)

Entropy [125]

𝑒𝑛𝑡

= − ∑ ∑ 𝑃𝛿

∧

(𝑖, 𝑗)log𝑃𝛿

∧

(𝑖, 𝑗)

𝐿−1

𝑗=0

𝐿−1

𝑖=0

65

Table 3-1: Extracted features of nodule candidates.0

Centroid

[17] 𝑥0 = 𝑚10/𝑚00 , 𝑦0 = 𝑚01/𝑚00

Roundness

[108]

𝑅 = 4𝐴

𝜋𝐿2

Minimum

Value Inside

[92]

𝐼𝑚𝑖𝑛 = 𝑚𝑖𝑛(𝐼)

Inverse Difference

Moment [125]

𝑖𝑑𝑚 = ∑ ∑𝑃𝛿

∧

(𝑖, 𝑗)

1 + (𝑖 − 𝑗)2

𝐿−1

𝑗=0

𝐿−1

𝑖=0

Major Axis

Length [30]

𝑎

= 2[2(𝜇20 + 𝜇02 + √(𝜇20 − 𝜇02)2 + 4𝜇11

2 )

𝜇00

]1/2

Volume

[18]

𝑉𝑜𝑙 = ∑ 𝑜

𝑜 ∈ 𝑂

Skewness [57]

𝑆𝑘𝑒𝑤

=∑ (xi − x)3𝑛

𝑖=1

(n − 1)3

Contrast [125]

𝑐𝑜𝑛

= ∑ 𝑛2

𝐿−1

𝑛=0

{∑ ∑ 𝑃𝛿Λ(𝑖, 𝑗)

𝐿−1

𝑗=0

𝐿−1

𝑖=0

}

Minor Axis

Length [30]

𝑏

= 2[2(𝜇20 + 𝜇02 − √(𝜇20 − 𝜇02)2 + 4𝜇11

2 )

𝜇00

]1/2

Compactne

ss [18]

𝐶𝑚𝑝 = 𝑉𝑜𝑙

43

𝜋𝑟3

Kurtosis [57]

𝐾𝑢𝑟𝑡

=∑ (xi − x)4𝑛

𝑖=1

(n − 1)s4

66

characteristics of different organizations, such as high gray level may be pulmonary nodules

and vascular tissue. Gray-scale features are the most direct features of CT images and are quite

effective [57,92]. The expressions of the extracted intensity features are presented in 2nd

column of Table 3-1.

3.3.2.3 Texture Features

Texture features are statistical information about the spatial distribution of pixel gray values in

images. Texture features are based on locale statistics that contain multiple pixel information.

The characteristic of texture is better than the characteristics of the gray-scale and the position

of the region. In the matching of regional objects, the feature matching detection ability is

stronger with texture features. At the same time, as a statistical feature, the texture feature has

a rotational invariance and a strong resistance to noise.

The model of texture features of CT images can also be used to detect the benign and malignant

pulmonary nodules (Though it is beyond the scope of this dissertation). It is not only possible

to determine the role of pulmonary nodules but also to predict the function of pulmonary

nodules. In the process of detecting pulmonary nodules, doctors can intuitively judge the degree

of light and shade of nodules (i.e., the expression of gray values) and the shape of suspected

pulmonary nodules and this regional qualitative pathologic feature is very important and

meaningful for physician-assisted diagnosis. Texture features can be used to measure the

smoothness and regularity of an image's appearance and this kind of characteristic values can

be used in the auxiliary diagnosis system.

The basic matrix applied in the texture feature calculation is a gray-level co-occurrence matrix.

The gray-level co-occurrence matrix can reflect the spatial information of gray-scale statistical

information and gray-scale distribution [125]. Four texture parameters, such as energy,

67

contrast, entropy and inverse difference moment are calculated through the normalized gray-

level co-occurrence matrix [125,126]. The expressions of these are presented in third column

of Table 3-1, where ( , )P i j represents gray-level co-occurrence matrix, i and j represent an

image gray level, L represents the maximum gray level of the image and represents the

spatial position relationship between two pixels. Because the gray-level co-occurrence matrix

is a regional performance property, the selected directional angles are 0°, 45°, 90° and 135°

while the pixel distance selection equals 2. A brief description of the extracted texture

features is presented in the following section.

Energy indicates the degree of uniformity of the gray distribution of the image pixels and the

degree of texture thickness. It is the sum of squares of the gray level co-occurrence matrix

values. If suspected pulmonary nodule region gray value changes in the smaller quantity, this

value is smaller, otherwise, if some of the values in nodule region differ from the others, the

value is larger. This value can indicate the uniformity of the region and the texture thickness

pattern.

Contrast indicates the sharpness of the grayscale image and the depth of the texture groove. If

texture groove is deeper, we will have higher value of contrast and the effect would be clearer

otherwise the contrast would be small with shallow groove and the effect would be blurred.

Contrast will have smaller value if the gray-level co-occurrence matrix consists of more similar

gray values and if we have larger element values away from the diagonal then the contrast

would be greater.

Entropy represents the measure of the amount of information the image has. If gray-level co-

occurrence matrix summarizes all the elements with maximum randomness, then all the values

in gray-level co-occurrence matrix are basically equal. The more decentralized distribution of

68

elements in the gray-level co-occurrence matrix indicates larger entropy. It can indicate the

complexity of the texture in the image.

Inverse difference moment represents the homogeneity of the image texture and measures the

degree of local changes in image texture. Larger value of inverse difference moment indicates

that there is no change between the different areas of the image texture and the local is very

uniform.

3.3.3 Pulmonary Nodules Feature Selection

There are hundreds of suspected pulmonary nodules features reported in literature. It is of

critical importance to select the descriptive features that has a considerable effect on the

detection efficiency of the system. Some features have no effect while some features play very

important role in the auxiliary diagnosis so the optimization of the feature set is a key issue.

We have selected hybrid set of lung nodule features, which has been achieved after

experimentation and correlation analysis [127]. The degree of correlation of the features of

similar basic elements must be greater than those of the different features. Therefore, the

correlation of the features belonging to one of the three main feature types must be relatively

high. The correlation of features is mainly the degree of association of the two features and the

quantization of the correlation degree of features is beneficial to the computer processing.

Correlation can be divided into linear correlation and nonlinear correlation. This thesis mainly

considers linear correlation. There are many kinds of statistical methods in which one of the

most commonly used methods is Pearson’s Product-Moment correlation. In this thesis, the

correlation between features is computed using this method. Correlation coefficient of any two

samples A and B can be calculated as [127]:

69

(3.21)

where 𝑎 and �� represents the means of A and B, respectively. This dissertation collects and

collates the characteristic information of suspected pulmonary nodules and extracts some

samples from them for correlation detection as shown in Table 3-2.

Table 3-2: Feature Correlation Information

Correlation Variance Area Major

axis

Circulari

ty

Compac

tness Energy Contrast Entropy IDM

Variance 1.000 0.598 0.665 -0.512 -0.471 -0.326 -0.072 0.372 -0.175

Area 0.598 1.000 0.951 -0.385 -0.397 -0.365 -0.055 0.484 -0.287

Major axis 0.665 0.951 1.000 -0.569 -0.571 -0.429 -0.124 0.531 -0.191

Circularity

-0.512 -0.385 -0.569 1.000 0.975 0.229 -0.105 -0.228 0.031

Compactne

ss -0.471 -0.397 -0.571 0.975 1.000 0.134 -0.155 -0.142 0.113

Energy -0.326 -0.365 -0.429 0.229 0.134 1.000 0.043 -0.964 -0.281

Contrast -0.072 -0.055 -0.124 -0.105 -0.155 0.043 1.000 -0.068 -0.161

2 2

( )( )

( ) ( )

i i

i

AB

i i

i i

a a b b

ra a b b

70

From Table 3-2, it can be shown that the range of the characteristic correlation value is in the

range of [-1,1]. The closer the value is to 0, the less relevant the two features are. Which can

be seen that circularity and compactness are highly correlated, while other features are

relatively low.

After correlation analysis, rigorous experimentation was done in selection of feature set which

gave the optimum results in classification of lung nodule candidates. Our approach was to

select a large initial set of features that represents the state of the art in features utilized by the

most successful published CAD systems. From this initial pool, we carried out feature selection

and trimmed down the feature set to the optimal subset for nodule detection considering both

the sensitivity and the FP/scan. We can broadly classify the extracted nodule features into:

shape, intensity, and texture related quantities as shown in Table 3-1. These features were

extracted from all the lung nodule candidates and used for classification.

3.4 Classification of pulmonary nodules

3.4.1 Support Vector Machine Classifier

Once the feature vectors have been formed, they are used as an input for classification and false

positive reduction. In our proposed method, we have used SVM classifier as it is

computationally efficient and gives better results [128,129]. Though we have experimented

with some other supervised classifiers like K-Nearest-Neighbour (KNN) [130], Decision Tree

[131] and Linear Discriminant Analysis (LDA) [132] but results clearly indicate the superiority

of SVM as compared to other classifiers. SVM algorithm is used to convert linearly indivisible

Entropy 0.372 0.484 0.531 -0.228 -0.142 -0.964 -0.068 1.000 0.387

IDM -0.175 -0.287 -0.191 0.031 0.113 -0.281 -0.161 0.387 1.000

71

high-dimensional space into linearly separable hyperplane algorithm and then classify true and

false pulmonary nodules through hyperplane. The main reason due to which we have preferred

SVM algorithm in this dissertation is that the nature of information of pulmonary nodules is

binary and SVM performs considerably well when there are only two classes to predict. SVM

can also efficiently perform non-linear classification using kernel trick [128]. There are four

main types of kernel functions such as:

(1) Linear kernel function can be expressed as:

( , ) T

i j i jK x x x x (3.22)

(2) Polynomial kernel function can be expressed as:

( , ) ( )T d

i j i jK x x x x r (3.23)

(3) Gaussian Radial Basis Function (RBF) kernel function can be expressed as:

2

( , ) exp( )i j i jK x x x x (3.24)

(4) Hyperbolic Tangent kernel function can be expressed as:

( , ) tanh( )T

i j i jK x x x x r (3.25)

Where γ, r and d are kernel function parameters. In four kernel functions, the RBF kernel

function performs considerably well because it can map the low-dimensional indivisible space

to high-dimensional separable state. The RBF kernel function SVM needs to calculate the

penalty factor and the kernel function parameter. The RBF kernel function is simpler than other

kernel functions and it has better effect on the irregular classification information of the lung

nodules.

72

In this dissertation, we have used a polynomial and a radial basis function as kernel functions.

The penalty factor and kernel scale parameters have been optimized using grid search. An

exhaustive grid search has been used to select these parameters where the range of penalty

factor and kernel is selected as C = 100, ……, 102 and γ = 2-3, ……, 23 respectively2. The

interval between the two consecutive values of penalty factor and kernel is set as 1 and 0.2

respectively. Different pair of (C, γ) values are tried and the one with best cross-validation

accuracy is picked [133,134]. This optimized pair of parameter is then used to train the

classifier using only training data.

To train the classifier, we use the annotated data from the radiologists. Normally the number

of nodule samples are much less than the number of non-nodules, affecting the performance of

a classifier. To remove this biasness, we have balanced our dataset by selecting the equal

number of nodules and non-nodules randomly. Next, the balanced dataset is randomly split into

training and testing datasets. More specifically, 70% of the data is used for training and 30%

of the data is held out as a test set for the final evaluation of the system. In training phase, we

have used k-fold cross-validation scheme for model selection and validation whereas the k

value varies for 5, 7 and 10. In k-fold cross-validation scheme, training data set is randomly

divided into k-equal sized sub-samples. Then from those samples, one sample is selected as

validation data for model assessment and remaining 𝑘 − 1 samples are used for training the

classifier. This process is repeated 𝑘-times. The 𝑘 results from the folds are then averaged to

produce a single estimation. The advantage of this scheme is that each sample is used for

training and validation purposes having each value used only once for validation.

In training phase, the input to the classifier consists of the feature vector and the known class

labels. Once the classifier is trained and its hyper-parameters are tuned, then the final evaluation

2 After the optimization of C and γ for SVM-Gaussian, the range for C and γ was reduced to C = 100, ……, 101

and γ = 2-3, ……, 22 to reduce the number of iterations.

73

of the classifier is done using the test set only. More specifically, 30% of the data held out

initially is used for final evaluation of the classifier and the corresponding results are reported

in next section. One thing to note that now the input to the classifier consists of only the feature

vector.

It is also noteworthy that feature selection was done using the training set only. Once we get

the optimal feature set for nodule detection considering both the sensitivity and the FP/scan

from training dataset, then we fix it and apply it to the test set.

The performance of a classifier can be calculated by the standard performance metrics mainly

sensitivity, specificity, accuracy and receiver operating characteristic curves (ROC curves)

[135]. ROC curves are obtained by plotting the sensitivity and false positive rate for different

threshold values. The area under the ROC curves summarizes the performance of the classifier.

These metrics can be calculated as follows:

𝑠𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 =𝑇𝑃

𝑇𝑃 + 𝐹𝑁 (3.26)

𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 =𝑇𝑁

𝑇𝑁 + 𝐹𝑃 (3.27)

𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =𝑇𝑃 + 𝑇𝑁

𝑇𝑃 + 𝐹𝑃 + 𝑇𝑁 + 𝐹𝑁 (3.28)

Where TP, TN, FP, and FN denote true positive, negative, and false positive and negative

labels. In summary, the main steps involved in classification stage are (1) Formation of feature

vector to be served as input to the classifier in training phase. (2) Optimization of penalty factor

and kernel scale parameters using Grid Search (3) Balancing of the dataset by selecting the

equal number of nodules and non-nodules randomly and splitting it into training and test

datasets (4) Training of classifier using the k-fold cross validation scheme for model selection

and validation using the training data only. (5) Testing of trained classifier to obtain the final

74

classification results using the test data only held out initially. This process is shown in Figure

3-8.

Rule based Classifier

SVM ClassifierClassification

ResultsFeature

OptimizationFeature Vector

Figure 3-8 Flow Chart of Classification Process

3.4.1.1 Sample Experimental Results

We have also represented three different classification cases in Figure 3-9, where (a), (d) and

(g) represents the original sequence, (b), (e) and (h) indicates that the system marks the

pulmonary nodule image, and (c), (f) and (i) represents the annotation of one of the four expert

radiologists in the pulmonary nodule xml file

(a) (b) (c)

(d) (e) (f)

75

(g) (h) (i)

Figure 3-9: Pulmonary Nodule Classification Results

(Each radiologist annotates the contour information differently). Where (c) shows annotation

of a pulmonary nodule with a diameter of about 3 mm, (f) shows annotation of a diameter of

more than 10 mm pulmonary nodules and (i) shows annotation of pulmonary nodules with a

very small diameter. According to the comparison of the image effect, the annotation in (c) is

basically consistent with the standard condition, whereas the annotation profile is slightly

different in (f) and the annotation is inconsistent in (i).

76

Chapter 4: RESULTS AND DISCUSSION

This chapter presents the results of our proposed method and the following discussion. A brief

analysis of our used dataset LIDC and the image format DICOM is presented in the first section

of this chapter. The next section presents the experimental environment including the graphical

user interface developed for user along with the results and corresponding analysis of all the

modules of the experimental environment. The results of our proposed method and the

comparison with other CAD systems is presented in the last section.

4.1 Dataset and Hit Criteria

We have done an extensive evaluation of our proposed system on Lung Image Database

Consortium (LIDC) [27]. LIDC is a publicly available database accessible from The Cancer

Imaging Archive (TCIA). We have considered the 850 scans3 (LIDC-IDRI-0001 to LIDC-

IDRI-0844) of this dataset, which contains nodules of size 3-30 mm fully annotated by four

expert radiologists in two consecutive sessions. The total number of nodules in 850 CT scans

is 2242. Each CT scan consists of 150-300 slices where each slice is of size 512*512 and 4096

gray level values in HU. The pixel spacing is 0.78 mm – 1 mm and reconstruction interval

varies from 1-3 mm.

We have considered all the nodules (i.e. 2242) including all the agreement levels (AL) among

the observers in evaluation of our proposed system. These nodules consist of the group of

nodules which have been marked by all four radiologists (full agreement between observers,

denoted by AL4) and the group of nodules marked by three out of four (majority agreement

between observers, denoted by AL3) and the group of nodules marked by two or only one

radiologist (minority agreement between observers, denoted by AL2 and AL1 respectively).

3 The case no. LIDC-IDRI-0132,0151,0315,0332,0355,0365,0442,0484 appear twice as distinct cases in the dataset and

cases with IDs. LIDC-IDRI-0238,0585 do not exist in the dataset.

77

In our evaluation, we have considered each detected nodule as a nodule if its distance to any of

the nodule in the dataset is smaller than 1.5 times the radius of that nodule. This value is

achieved by experiments and it has been used in some other studies as well [12,18,30]. We call

it a near hit. If a hit has been made on a detected nodule we call it as true positive otherwise it

is called false positive. If there are multiple hits for a same reference nodule then only the

candidate which overlaps the most with the reference nodule’s merged manual segmentation is

counted as true positive and other candidates are ignored i.e. they are not counted as true

positives or false positives for scoring purposes.

4.1.1 DICOM Resources

The LIDC scans are in the DICOM (Digital Imaging and Communications in Medicine) format

[14]. DICOM is a widely used medical imaging standard across the globe. We have analyzed

the characteristics of DICOM sequence imaging (Detail can be found in Appendix A) which is

another important aspect and lays a foundation for the research of our proposed algorithm.

With the progress and development of science and technology, the investment of developed

countries in the world has increased in the two major areas of education and health care.

However, the level of health care is the most important, reflecting the national state of science

and technology. In past, medical equipment manufacturers used to face various problems, due

to their different image formats as well as the non-uniformity of equipment interface, such as

communication between devices, image sharing issues, medical development continuity issues

and so on. These problems lead to the imbalance and waste of medical resources.

As a result, American College of Radiology (ACR) and National Electrical Manufacturers

Association (NEMA) formed a committee and ultimately developed a medical image storage,

display and transmission’s unified standard “DICOM” which describes in detail the interaction

modes, medical image formats, and communication fundamentals among different

78

manufacturers. The coding structure of the DICOM-CT image and analysis of XML

information file which LIDC provides as a reference can be found in Appendix A and B

respectively.

4.2 Experimental Environment

We have developed a fully automated system for lung nodule detection. The user interface of

the developed system is shown in Figure 4-1. The developed system has five main modules

and some supplementary functions for user support. In the following section, we have described

each of them.

Figure 4-1 User Interface of Lung CAD

4.2.1 Image Preprocessing Module

The image preprocessing module includes the image denoising function as shown in Figure 4-

2 in which we have used the median filtering as the image filtering method. This module

79

removes any present noise in the image and the output of this module is served as an input for

the next module.

Figure 4-2 Sample Output of Image Pre-processing Module

4.2.2 Lung Segmentation Module

The lung segmentation module shown in Figure 4-3 consists of the two main functions (i) Lung

Figure 4-3 Sample Output of Lung Segmentation Module

80

segmentation and (ii) Edge smoothing. Lung segmentation function consists of series of steps

described earlier in detail and the output of this function is the segmented lung region from the

pre-processed CT image. The second function in this module smoothes the lung contour using

rolling ball algorithm to include any juxta-pleural nodule present on the lung contours.

4.2.3 Image Enhancement Module

The image enhancement module shown in Figure 4-4 consists of the main function namely

“Multi-scale Dot Enhancement Filter”. This function takes the output of lung segmentation

module as an input and uses our proposed multi-scale enhancement filter to enhance the low-

density nodules by increasing the overall contrast of the region of interests (ROI) and

weakening other irrelevant structures present in the lung region.

Figure 4-4 Sample Output of Image Enhancement Module

In the following paragraphs, we present sample results and discussion of image pre-processing,

lung segmentation and enhancement module. These samples are taken from different scans of

LIDC-IDRI dataset used in the dissertation. Figure 4-5 presents the top, middle and bottom

slices of different scans. The first column of Figure 4-5 presents the results of image

81

preprocessing module on these different slices. The preprocessing module is responsible for

grayscale conversion of the input image and removal of any present noise in the image through

median filtering. The (a), (d), (g) and (j) of Figure 4-5 represents the image pre-processing

results on the middle slices of different scans while (m), (p) of Figure 4-5 represents the image

pre-processing results on the top slices of different scans and (s), (v), (y) of Fig 4-5 represents

the image pre-processing results on the bottom slices of different scans respectively. It can be

seen from these results that image pre-processing module removes any present noise efficiently

from the input CT images.

The second column of Figure 4-5 represents the results of lung segmentation module on these

different slices. The lung segmentation module is responsible for segmenting the lung from

pre-processed input image and consists of several operations including thresholding, back

ground removal, lung contour smoothing and lung separation. The detail of these steps is

explained in the previous chapter. We have presented the final segmented lung which is the

main output of this module. The (b), (e), (h) and (k) of Figure 4-5 represents the lung

segmentation results on the middle slices of different scans while (n), (q) of Figure 4-5

represents the lung segmentation results on the top slices of different scans and (t), (w), (z) of

Fig 4-5 represents the lung segmentation results on the bottom slices of different scans

respectively. From the results, it can be seen that segmentation module works pretty well for

top and bottom slices as shown in (n), (q) and (t), (w), (z) of Figure 4-5 respectively. For middle

slices, we have presented two scenarios (i) First scenario in which the segmentation module

segments the lung without any artifacts (ii) Second scenario in which the segmentation module

segments the lung with some artifacts. The (b) of Figure 4-5 represents a segmented lung as an

output of lung segmentation module. Close examination of this output in (b) with the pre-

processed image in (a) of Figure 4-5 reveals that there are almost no artifacts involved in the

segmented lung. Similar situation is presented in (h) where no artifacts were found in the

82

segmented lung when compared to the pre-processed image in (g) of Figure 4-5. The second

scenario is presented in (e) of Figure 4-5 which represents a segmented lung as an output of

lung segmentation module. Close examination of this output in (e) with the pre-processed

image in (d) of Figure 4-5 reveals that there are some artifacts involved in the segmented lung.

On the middle of the right side of left segmented lung, a minute closing is observed between

the two disjoint regions. This artifact is highlighted with the help of a red circle in the figure.

Similar situation is presented in (k) where close examination of this output in (k) with the pre-

processed image in (j) of Figure 4-5 reveals the same artifact involved in the segmented lung

which is highlighted with the help of a red circle in the figure.

The third column of Figure 4-5 presents the results of image enhancement module on these

different slices. The image enhancement module is responsible for enhancing the low-density

nodules which may remain undetected without proper enhancement and can affect the

performance of the system. The image enhancement method takes into the consideration that

nodules exhibit the spherical nature and vessels are line like elongated structures therefore this

module enhances the dot like spherical objects. The (c), (f), (i) and (l) of Figure 4-5 represents

the image enhancement results on the middle slices of different scans while (o), (r) of Figure

4-5 represents the image enhancement results on the top slices of different scans and (u), (x),

(zz) of Fig 4-5 represents the image enhancement results on the bottom slices of different scans

respectively. It can be seen that image enhancement module works well with almost all the

cases irrespective of the different scans with different imaging protocols resulting different

density levels and irrespective of the slice orientation. We have presented one case in (zz) of

Figure 4-5 where the result of image enhancement module is not desirable in a specific portion

of a bottom slice which is highlighted in figure.

83

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

Fig. 4-5 Sample Results of Image Preprocessing, Segmentation and Enhancement Modules

on Top, Bottom and Middle Slices of Different Scans.

84

(j) (k) (l)

(m) (n) (o)

(p) (q) (r)

Fig. 4-5 (cont’d) Sample Results of Image Preprocessing, Segmentation and Enhancement

Modules on Top, Bottom And Middle Slices of Different Scans.

85

(s) (t) (u)

(v) (w) (x)

(y) (z) (zz)

Fig. 4-5 (cont’d) Sample Results of Image Preprocessing, Segmentation and Enhancement

Modules on Top, Bottom And Middle Slices of Different Scans.

86

4.2.4 Lung Nodule Segmentation Module

The “ROI Extract” module is responsible for candidate lung nodule detection and feature

extraction which are served as an output for next module. This module takes the enhanced

image as an input and detects the lung nodule candidates using optimal thresholding followed

by a rule-based analysis which only selects the good nodule candidates. Finally, the features

are extracted from good nodule candidates and served as input to the next module.

4.2.5 Lung Nodule Classification Module

This module serves as an output of the lung CAD system. This module takes the extracted

features as an input and train the classifier with the training data and gives the classification

results on test data.

4.2.6 Supplementary Functions Module

We have also developed some supplementary functions apart from the main modules to

increase the interactivity and support to the user. These functions are placed on the top center

and bottom of the left as shown in Figure 4-1. The top center function provides the input and

output image functionalities like the image sequencing, the coordinate information in the output

image and the options to read, write and reset the image. The functions placed on the bottom

left provides the facility to go to the very last or first slice of the CT sequence. There are also

two specialized functions namely “Broadcast” and “Automatic”. The former function provides

the facility of scanning through all the images with one click while the latter shows the final

output of CAD system in one click.

4.3 Classification Results of SVM with Different Kernel Functions

Our system detects 2112 nodules with 38682 non-nodules, which gives the detection rate of

94.20 % with 45.51 % FP/scan. Note that these non-nodules have been further reduced by the

use of a classifier at the classification stage. Results have been summarized in the following

87

tables. Table. 4-1 shows the classification results of SVM with different kernel functions on

test dataset while using 2, 5 and 7-fold cross validation schemes in training phase. It is to note

that penalty factor and kernel parameters of these models have been optimized using grid

search. For SVM-Gaussian, the pair (C=1 and 𝛾= 0.125) achieved maximum cross-validation

accuracy as shown in Figure 4-6 and was used to train the model, while for SVM-Cubic, the

pair (C=10 and 𝛾= 1.325) achieved maximum cross-validation accuracy as shown Figure 4-7

and was used to train the model. Lastly, for SVM-Quadratic, the pair (C=9 and 𝛾= 0.525)

achieving maximum cross-validation accuracy was selected to train the model. The optimized

pair is shown in Figure 4-8.

Table 4-1: Classification Results of SVM on test dataset with different kernel functions using

2, 5 and 7-fold Cross Validation Scheme in training phase.

k-fold Classifier AUC Accuracy

(%)

Sensitivity

(%)

Specificity

(%) FPs/Scan

2-Fold

SVM-

Gaussian 0.995 97.10 98.15 96.01 2.19

SVM-

Cubic 0.943 90.10 92.12 88.63 3.50

SVM-

Quadratic 0.907 83.40 80.21 85.73 4.27

5-Fold

SVM-

Gaussian 0.995 97.40 98.32 96.46 1.88

SVM-

Cubic 0.949 90.10 92.28 88.31 3.36

SVM-

Quadratic 0.916 83.80 80.90 86.16 3.98

88

Figure 4-6: Grid Search Results for SVM -Gaussian showing the pair of parameters (C=1 and

𝛾= 0.125) with best cross validation accuracy.

7-Fold

SVM-

Gaussian 0.994 97.40 98.41 96.40 1.91

SVM-

Cubic 0.955 90.90 92.67 89.38 3.11

SVM-

Quadratic 0.919 83.20 80.29 85.59 3.76

89

Figure 4-7: Grid Search Results for SVM-Cubic showing the pair of parameters (C=10 and

𝛾= 1.325) with best cross validation accuracy.

Figure 4-8: Grid Search Results for SVM-Quadratic showing the pair of parameters (C=9 and

𝛾= 0.525) with best cross validation accuracy

90

Our system has achieved a sensitivity of 98.41 % and an accuracy of 97.40 % using SVM with

Gaussian kernel function. It can be seen that Gaussian kernel function outperforms other kernel

functions regarding the accuracy of the system and 7-fold cross validation scheme yields the

maximum accuracy. The performance of the system with Gaussian kernel function remains

almost the same in 2 and 5-fold cross validation schemes with a slight difference in metrics.

Other two kernel functions, SVM-Cubic and Quadratic achieve the highest sensitivities of

92.67 % and 80.90 % respectively.

(a) (b)

(c)

Figure 4-9: ROC curves of the SVM classifier with different kernel function using (a) 2-Fold

Scheme, (b) 5-Fold Scheme (c) 7-Fold Scheme SVM-Q: Quadratic kernel function, SVM-G:

Gaussian kernel function, SVM-C: Cubic kernel function.

91

ROCs curves have been drawn to visualize the classifier’s performance. Figure 4-9 shows the

ROCs curves for SVM classifier with different kernel functions using 2, 5 and 7-fold cross

validation scheme, respectively. The confidence interval for these and subsequent curves was

set to 95%. It can be seen that SVM Gaussian kernel function outperforms the other two kernel

functions while SVM Quadratic function shows the lowest performance.

4.4 Classification Results of SVM with Different Kernel Scale and Penalty

Factor

In addition to performing grid search for the selection of (C, 𝛾 ), we have also experimented

with different values of kernel scale and penalty factor while keeping one of them constant to

observe the effect of these parameters. Table 4-2 shows the classification results of SVM-

Gaussian using different kernel scale values in 2-fold cross validation scheme. We have

evaluated our system using different values of kernel scale between the range 0.3 to 3. The

penalty parameter has been kept constant with a value of 1. It can be seen that the performance

of the system decreases with the increasing value of scale after achieving the maximum

accuracy at initial value of 𝛾=0.3. The system achieves a lowest accuracy of 83.30 % for a

value of 𝛾=3.

Table 4-2 Classification Results of SVM Gaussian on test dataset using different 𝛾 values

Penalty

Parameter

(C)

Kernel

scale ( 𝛾)

AUC Accuracy

(%)

Sensitivity

(%)

Specificity

(%)

FPs/scan

1 0.3 0.994 97.00 98.04 95.87 1.31

1 0.5 0.992 96.80 97.92 95.53 1.56

1 1 0.989 96.40 97.86 95.26 1.79

1 1.3 0.974 93.30 94.12 92.67 2.00

1 1.5 0.964 91.20 91.14 91.34 2.36

1 1.8 0.950 88.70 86.63 90.40 2.62

92

Table 4-3 shows the classification results of SVM-Gaussian using different penalty parameter

values in 2-fold cross validation scheme. The value of the penalty parameter used varies from

1 to 4. The value of kernel scale has been kept constant. It can be seen that the accuracy of the

system increases with the increasing value of penalty parameter and attains a maximum value

of 97.0 % for C=4.

Table 4-3: Classification Results of SVM-Gaussian on test dataset using different C values

and 2-fold Cross Validation Scheme in training phase.

Figure 4-10 (a) shows the ROCs curves for SVM classifier with Gaussian kernel function using

different kernel scale values in 2-fold cross validation scheme. The kernel scale value varies

from 0.3 to 3 by keeping the penalty parameter constant. It can be seen that the performance of

the classifier decreases with the increasing value of 𝛾. Figure 4-10 (b) shows the ROCs curves

for SVM classifier with Gaussian kernel function using different penalty parameter values in

2-fold cross validation scheme. The penalty parameter value varies from 1 to 4 by keeping the

kernel scale value constant. It can be seen that the performance of the classifier remains almost

the same with minor increase.

1 2 0.942 87.40 84.79 89.57 2.85

1 2.5 0.922 84.30 79.98 87.93 3.29

1 3 0.913 83.30 79.60 86.41 3.71

Penalty

Parameter

(C)

Kernel

scale ( 𝛾)

AUC Accuracy

(%)

Sensitivity

(%)

Specificity

(%)

FPs/scan

1 1 0.989 96.40 97.86 95.26 1.79

2 1 0.991 96.70 98.12 95.45 1.74

3 1 0.991 96.90 98.23 95.07 1.84

4 1 0.992 97.00 98.32 95.26 1.79

93

(a) (b)

Figure 4-10: ROC curves of the SVM classifier with Gaussian kernel function using 2-Fold

cross validation scheme with (a) different kernel scale 𝛾 values, varying from 0.3 to 3 (b)

with different penalty parameter C values varying from 1 to 4.

4.5 Classification Results of Different Classifiers

In addition to the SVM classifier, we have also evaluated our system using some other

supervised classifiers mainly K-Nearest-Neighbour, Decision Tree, Linear Discriminant and

Boosted Tree. Table 4-4 shows the classification results of these classifiers using 2-fold cross

validation scheme. It can be seen that Decision Tree shows better performance as compared to

other classifiers by achieving maximum accuracy and sensitivity while Linear Discriminant

performs poorly by achieving the lowest sensitivity.

Table 4-4: Classification Results of different classifiers on test dataset using 2-fold Cross

Validation Scheme in training phase.

Classifier AUC Accuracy

(%)

Sensitivity

(%)

Specificity

(%)

FPs/scan

Decision Tree 0.942 91.40 96.03 87.55 3.39

Linear

Discriminant

0.792 74.10 57.06 88.18 3.25

K-Nearest-

Neighbour

0.882 78.40 83.04 74.59 6.93

94

4.6 Feature Ranking

Various features have been proposed in literature to differentiate between nodules and other

anatomical structures but the research on measuring the effectiveness of these features have

been limited. In this dissertation, we have compared different classes of features to determine

the most relevant feature class. Table 4-5 shows the classification results of SVM-Gaussian

using different classes of features in 2-fold cross validation scheme. Features from class Shape

shows the maximum performance regarding sensitivity and accuracy of the system as

compared to other feature classes. But results clearly show that it is very difficult to achieve

high performance metrics using only a single class therefore hybrid approach in feature

selection remains a better choice.

Table 4-5: Classification Results of SVM-Gaussian on test dataset using different feature

classes and 2-fold Cross Validation Scheme in training phase.

Figure 4-11 (a) shows the ROCs curves for different feature classes using SVM classifier with

Gaussian kernel function in 2-fold cross validation scheme. It can be seen that features from

class Shape shows the maximum performance as compared to other two feature classes. Figure

4-11 (b) shows the ROCs curves for different classifiers in 2-fold cross validation scheme. It

Boosted Tree-

Ensemble

0.959 89.60 91.67 87.93 3.29

Features AUC Accuracy

(%)

Sensitivity

(%)

Specificity

(%)

FPs/scan

Intensity 0.780 72.40 68.53 76.49 6.13

Shape 0.902 84.60 80.59 87.87 3.62

Texture 0.835 76.40 71.89 80.09 5.43

95

is noteworthy that Linear Discriminant classifier performs poorly as compared to other

classifiers by having the lowest area under the curve.

(a) (b)

Figure 4-11: (a) ROC curves of SVM classifier with Gaussian kernel function using 2-Fold

cross validation scheme with different feature classes (b) ROC curves of different classifiers

using 2-Fold cross validation scheme.

4.7 Comparison with Other Systems

From the review of the existing methods, we found that it is very hard to compare the results

with the previously published work because of their use of non-uniform performance metrics

and different evaluation criteria including the dataset and types of nodules considered. Despite

of this constraint, we have tried to make a performance comparison of our proposed system

with the other Lung CAD systems as shown in Table 4-6. It can be seen that our proposed

system shows better performance as compared to other systems regarding sensitivity and

FP/scan. Other systems which are close in the performance are Choi et al. [12], Messay et al.

[92] and Akram et al. [57]. Choi et al. [12] proposed a novel shape-based feature extraction

method. Eigen value decomposition of Hessian matrix was done to obtain the surface elements

which could describe the local shape information of the target object and features were formed

from these surface elements. The system was evaluated by considering 148 nodules in 84 scans

96

of LIDC dataset. System shows good performance in terms of sensitivity by achieving a value

of 97.5 % but underperforms in terms of false positives by having a value of 6.76 FP/ scan.

Messay et al. [92] computed a detailed feature set consisting of 245 features (2D & 3D) mainly

belonging to feature classes of shape, intensity and gradient. A sequential forward selection

method was next applied to obtain the optimum feature subset. The system was evaluated using

LIDC dataset and considering 143 nodules. System shows good performance in terms of false

positives with a value of 3 FP/scan but underperforms in terms of sensitivity. Akram et al. [57]

computed the 2D shape features (Area, Diameter, Perimeter, Circularity), 3D shape features

(Volume, Compactness, Bounding Box Dimensions, Elongation, Principal Axis Length) and

2D and 3D intensity based statistical features (Mean inside, Mean outside, Variance inside,

Kurtosis inside, Skewness inside, Minimum value inside, Eigen values). The system was

evaluated using LIDC dataset. System shows good sensitivity having a value of 95.31 % but

the number of nodules used to validate the results is too small. Table 4-6 summarizes the

performance comparison of our proposed system with recently published lung CAD systems.

Table 4-6: Performance Comparison of Different CAD Systems, *N/A means Not Available.

CAD

Systems

Year

Data Set

Nodule

Size(m

m)

Number of

Nodules

Sensitivity

(%)

FPs/scan

Proposed

System

2018 LIDC 3-30 2242 98.15 2.19

Setio et al.

[111]

2016 LIDC 3-30 1186 90.1 4.00

Dou et al.

[136]

2017 LIDC 3-30 1186 90.7 4.00

Bergtholdt

et al. [137]

2016 LIDC 3-30 690 85.9 2.50

Akram et al.

[57]

2016 LIDC 3-30 50 95.31 N/A

Torres et al.

[138]

2015 LIDC 3-30 1749 80.00 8.00

97

Our system detected 2112 nodules out of 2242 total number of nodules with an agreement level

one (AL1) between the observers in 850 CT scans. The total number of false negative (missed

nodules by the system) are 130. We have evaluated the characteristics of these nodules with

respect to size, internal structure and subtlety. We have divided the missed nodules in four

categories with respect to size (i) nodules< 4 mm (ii) nodules 4 to 6 mm (iii) nodules 6 to 8

mm (iv) nodules > 8 mm. Our system missed 31 nodules out of 322 nodules less than 4 mm,

38 nodules out of 657 nodules ranging from 4 to 6 mm, 34 nodules out of 697 nodules with 6

to 8 mm and 27 nodules out of 566 nodules greater than 8 mm as shown in Figure 4-12.

van

Ginneken et

al. [110]

2015 LIDC 3-30 1147 78.00 4.00

Choi et al.

[12]

2014 LIDC 3-30 148 97.50 6.76

Teramoto et

al. [109]

2014 Private 4-30 103 83.00 5.00

Choi et al.

[18]

2013 LIDC 3-30 151 95.28 2.27

Tartar et al.

[108]

2013 Private 2-20 95 89.60 7.90

Orozco et

al. [107]

2013 LIDC,EL

CAP

2-30 75 84.00 7.00

Assefa et al.

[105]

2013 ELCAP N/A 165 81.00 35.15

Choi et al.

[30]

2012 LIDC 3-30 76 94.10 5.45

Messay et

al. [92]

2010 LIDC 3-30 143 82.66 3.00

Sousa et al.

[104]

2010 Private 3-40 33 84.84 N/A

98

Figure 4-12 Number of False Negatives with respect to Size

This shows that majority of the missed nodules (false negatives) were of diameter 6 mm or less

as shown in Figure 4-13. We also categorized the missed nodules with respect to texture

ranging from non-solid to solid nodules where a score of 1 was given to non-sloid (ground

glass nodule) and a score of 5 was given to solid nodules. We merged the ratings of radiologists

and defined a nodule as subsolid for which the average rating was less than 5. It was found that

31% of the false negatives were subsolid.

Figure 4-13 Percentage of False Negatives with respect to Size

99

We also categorized the false negative with respect to the subtlety ranging from a score of 1 to

5 where a score of 1 was given to extremely subtle and a score of 5 was given to obvious by

the radiologists. We merged the ratings of radiologists by averaging them and defined a nodule

as subtle with a score of less than 3. We found that 25% of the missed nodules were subtle.

The system’s overall performance in terms of detection sensitivity with respect to nodule size

is shown in Figure 4-14. It can be seen that the performance of the system increases with the

nodule size in terms of detection achieving a maximum value of 95.22% for nodules greater

than 8mm.

Figure 4-14 Detection Sensitivity with respect to Nodule Size

We have further investigated our system’s performance with agreement level 3 where there is

majority agreement between the observers. The number of nodules in this case were reduced

to 1160 from 2242 in 850 CT scans where the system’s performance remained almost the same

with a slight increase in detection sensitivity for the nodules < 4mm as shown in Figure 4-15.

With agreement level 3 (AL3) between the observers, the detection sensitivity of our proposed

system for nodules less than 4mm was 91.32% while it had the values of 94.32% and 94.68%

for nodules ranging between 4 to 6 mm and 6 to 8 mm respectively. The system achieved a

100

maximum sensitivity of 94.98% for nodules greater than 8 mm with an agreement level 3

between the observers.

.

Figure 4-15 Comparison of System’s Overall Performance w.r.t. different agreement levels

In the following section, we present some of the missed nodules (false negatives) by our system

as shown in Figure 4-16 and discuss the characteristics of the corresponding missed nodules.

(a) (b) (c)

(d) (e)

Figure 4-16 Sample Missed Nodules indicated by the red arrow (False Negative) by the

proposed system. Encircled objects in respective figures represent False Positive.

101

Figure 4-16 (a) shows a missed nodule (indicated with a red arrow) of an agreement level 4

where all the four radiologists marked it. The diameter of the missed nodule was approximately

7mm. We investigated different characteristics of the nodule scored on different scales by the

radiologists. The score of every characteristic from multiple radiologists was averaged to

produce a single value. The merged subtlety of the missed nodule shown in Figure 4-16 (a)

was 2 (subtlety was scored from 1= extremely subtle to 5= obvious). The internal structure of

the missed nodule was 1 (internal structure was scored on a scale of 1 to 4 where 1 represents

soft tissue, 2 represents fluid, 3 represents fat and 4 represents air). The internal calcification

of the missed nodule was 6 (calcification was scored from 1 to 6 where 1 denotes the popcorn,

2 represents the laminated, 3 represents the solid, 4 represents the non-central, 5 represents the

central and 6 represents the absence of calcification). The sphericity of the missed nodule which

gives the idea of the shape of the nodule with respect to roundness was 3 (the sphericity was

scored from 1 to 5 where 1 represents the linear appearance and 5 represents the round

appearance). The margin of the missed nodule which defines how accurately the nodule is

defined was 3 (where margin was scored from 1= poorly defined to 5=well defined). The

nodule spiculation of the missed nodule was 5 (where spiculation represents the description of

lung nodule and is more relevant as compared to lobulation in which the origin of the cancer

may not be primarily in the lung, that’s why we have not discussed lobulation in our analysis)

was scored on a scale of 1 to 5 where only the extreme values were defined with 1= no-

spiculation and 5= marked spiculation. The texture of the missed nodule was marked 5 where

texture was scored from 1 to 5 with only three values explicitly defined from 1= non-

solid/Ground Glass nodule, 3= part-solid and 5=solid. We have summarized the investigated

characteristics of the sample missed nodules present in Figure 4-16 in Table 4-7.

102

Table 4-7 Average Scores of Different Characteristics of Sample False Negatives

Nodule AL* D**(mm) Subtlety

Internal

Structure

Calcification Sphericity Margin Spiculation Texture

Fig4-16

(a)

04 7 2 1 6 3 3 5 5

Fig4-16

(b)

01 19 3 1 6 4 3 2 5

Fig4-16

(c)

02 6 2 1 6 4 1 1 1

Fig4-16

(d)

03 4 2 1 6 3 3 1 5

Fig4-16

(e)

01 4 5 1 3 5 5 1 5

*AL denotes Agreement Level between Observers, D** means Nodule Diameter

From close examination of these values mentioned in Table 4-7, we can mark the nodules (a),

(c), (d) of Figure 4-16 as “very subtle” having a merged subtlety value of 2 out of 5. Apart from

the absence of calcification which is common in these cases, it is observed that these sample

cases are not defined as well with an average margin rating of 3 (average) for (a)(d) and 1

(poorly defined) for (c). The score of (b) is in close resemblance to the cases discussed above

having a value of 3 for each subtlety and margin values whereas the sample shown in (e) of

Figure 4-16 shows a rare failure of the system. Figure 4-17 shows sample detected nodules

(True positive). From these samples, it can be seen that the proposed system performs well for

different nodule types having different sizes and texture.

103

Figure 4-17 Sample images of detected nodule (highlighted) by the proposed system (True

Positive). The arrow in respective figures indicates the False Positive detected by the system

along with True Positive.

Figure 4-18 shows the overall performance of our proposed CAD system by the free-response

ROC (FROC) curves [139] using SVM classifier with different kernel functions and 2-fold

cross validation scheme. The system shows robust and accurate performance in detecting

nodules.

104

Figure 4-18: FROC curves of the proposed system with respect to the different kernel

functions of SVM classifier.

105

Chapter 5: CONCLUSION AND FUTURE PROSPECT

5.1 Conclusion

A well performing CAD system contribute to the health provision by helping the expert

radiologist in the detection of lung cancer and by providing them with a second opinion. In this

dissertation, we have proposed a method with hybrid feature set for lung nodule detection. In

the pre-processing stage, the lung image has been thresholded using optimal thresholding,

followed by background removal, hole filling operations and lung segmentation. Then the

contour correction of the segmented lung fields has been made to include juxta-pleural nodules.

The candidate nodules have been detected and segmented simultaneously from an enhanced

image using multi scale dot enhancement filter. Shape, intensity and texture features have been

extracted from lung nodule candidates and used for false positive reduction using a SVM

classifier. The proposed system has been evaluated using the LIDC dataset and k-fold cross

validation. The achieved sensitivity is 98.15 % with 2.19 false positive per scan only.

In this thesis, we have used a hybrid feature set to improve the classification accuracy of the

system. Moreover, we have also made a comparison of feature classes which clearly indicate

that no single feature can detect the nodules with high precision. Thus, choosing right set of

features can improve the overall accuracy of the system by improving the sensitivity and

reducing false positives. We also experimented with different classifiers to assess the

performance of the system but results clearly show that SVM, with the flexibility of having

different kernel functions, remains a better choice as compared to other classifiers in terms of

accuracy.

5.2 Follow-up Work and Prospects

The computer-aided detection system of lung cancer still has a lot of research space. There has

been a lot of research in this particular area but to form a commercial product ready to be used

106

in hospitals, there is a need of development in technology and further research. Further work

and prospects found during this research are as follows:

1. The segmentation of suspected pulmonary nodules needs further research and development,

which requires the construction of a more complete template library for pulmonary nodules.

Recently, some researchers have carried out a large collection of nodule types and performed

the experiment of creating a pulmonary nodule template library but the main problem

encountered is the diversity of nodule characteristics.

2. The computer aided detection system should be tested on sufficiently large datasets to

achieve more robustness. Currently, the CAD systems are evaluated on relatively small datasets

so there is every chance that the performance of the system will be affected in real time clinical

tests. Through large experiments, we can prove the integrality and generality of the detection

system and form the products according to the market needs.

3. Selection of lung nodule feature set is another area which needs further research in terms of

new features that are more descriptive and can play a critical role in the classification phase.

Currently the number of features reported in literature that can be extracted are almost 150.

With an increase in the type and number of features, the method of feature selection can be

optimized and the characteristics and specific performance can be studied in depth and the

advantage of feature combination can be maximized. The optimization of feature set is also of

critical importance as one must take into consideration of the issues of overfitting.

4. The CAD system of lung cancer can include three-dimensional reconstruction and

processing function. With the rapid development of medical image processing technology,

three-dimensional reconstruction is an important part of image processing. Doctors need more

complete three-dimensional image products. Based on the reconstruction of three-dimensional

107

image and three-dimensional image segmentation, visualization of preoperative products based

on three-dimensional reconstruction require in-depth study and research.

5. Another future area which also needs to be focused is the detection of micro nodules (<

3mm). Future CAD systems should be able to detect all types of nodules (including micro

nodules) while maintaining the same precision in terms of sensitivity and reduced number of

FP/scan.

108

REFERENCES

[1] R. L. Siegel, K. D. Miller, and A. Jemal, “Cancer statistics, 2016,” CA. Cancer J. Clin.,

vol. 66, no. 1, pp. 7–30, Jan. 2016

[2] J. M. Diaz, R. C. Pinon, and G. Solano, “Lung cancer classification using genetic

algorithm to optimize prediction models,” in IISA 2014, The 5th International

Conference on Information, Intelligence, Systems and Applications, 2014, pp. 1–6.

[3] A. B. Mariotto, K. Robin Yabroff, Y. Shao, E. J. Feuer, and M. L. Brown, “Projections

of the Cost of Cancer Care in the United States: 2010-2020,” JNCI J. Natl. Cancer Inst.,

vol. 103, no. 2, pp. 117–128, Jan. 2011.

[4] Howlader, N., A. M. Noone, M. Krapcho, D. Miller, K. Bishop, S. F. Altekruse, C. L.

Kosary et al. "SEER Cancer Statistics Review, 1975-2013, National Cancer Institute.

Bethesda, MD." 2016-02-16]. http://seer. cancer. gov/csr 2016.

[5] C. Tiwari, K. Beyer, and G. Rushton, “The Impact of Data Suppression on Local

Mortality Rates: The Case of CDC WONDER,” Am. J. Public Health, vol. 104, no. 8,

pp. 1386–1388, Aug. 2014.

[6] M. A. Moore, P. Attasara, T. Khuhaprema, T. N. Le, T. H. N. Nguyen, P. P. Raingsey,

S. Sriamporn, H. Sriplung, P. Srivanatanakul, D. T. Bui, S. Wiangnon, and T. Sobue,

“Cancer epidemiology in mainland South-East Asia - past, present and future.,” Asian

Pac. J. Cancer Prev., vol. 11 Suppl 2, pp. 67–80, 2010.

[7] Ferlay, J., I. Soerjomataram, M. Ervik, R. Dikshit, S. Eser, C. Mathers, M. Rebelo, D.

M. Parkin, D. Forman, and F. Bray. "Lyon, France: International Agency for Research

109

on Cancer; 2013." Cancer Incidence and Mortality Worldwide: IARC CancerBase, vol.

10, 2008.

[8] Forman, D., J. Ferlay, B. W. Stewart, and C. P. Wild. "The global and regional burden

of cancer." World cancer report 2014 (2014): 16-53.

[9] M. Luqman, M. M. Javed, S. Daud, N. Raheem, J. Ahmad, and A.-U.-H. Khan, “Risk

factors for lung cancer in the Pakistani population.,” Asian Pac. J. Cancer Prev., vol.

15, no. 7, pp. 3035–9, 2014.

[10] “FACT SHEET: Investing in the National Cancer Moonshot.” Office of the Press

Secretary, The White House. Published 01 Feb 2016. Accessed 01 Aug 2016.

<https://www.whitehouse.gov/the-press-office/2016/02/01/fact-sheet-investing-

national-cancer-moonshot>.

[11] T. N. L. S. T. R. Team, “Reduced Lung-Cancer Mortality with Low-Dose Computed

Tomographic Screening,” N. Engl. J. Med., vol. 365, no. 5, pp. 395–409, Aug. 2011.

[12] W. J. Choi and T. S. Choi, “Automated pulmonary nodule detection based on three-

dimensional shape-based feature descriptor,” Comput. Methods Programs Biomed., vol.

113, no. 1, pp. 37–54, 2014.

[13] I. R. S. Valente, P. C. Cortez, E. C. Neto, J. M. Soares, V. H. C. de Albuquerque, and J.

M. R. S. Tavares, “Automatic 3D pulmonary nodule detection in CT images: A survey,”

Comput. Methods Programs Biomed., vol. 124, pp. 91–107, 2015.

[14] W. J. Kostis, A. P. Reeves, D. F. Yankelevitz, and C. I. Henschke, “Three-Dimensional

Segmentation and Growth-Rate Estimation of Small Pulmonary Nodules in Helical CT

Images,” IEEE Trans. Med. Imaging, vol. 22, no. 10, pp. 1259–1274, 2003.

110

[15] C. I. Henschke, D. I. McCauley, D. F. Yankelevitz, D. P. Naidich, G. McGuinness, O.

S. Miettinen, D. M. Libby, M. W. Pasmantier, J. Koizumi, N. K. Altorki, and J. P. Smith,

“Early Lung Cancer Action Project: overall design and findings from baseline

screening.,” Lancet (London, England), vol. 354, no. 9173, pp. 99–105, Jul. 1999.

[16] S. S. Parveen and C. Kavitha, “A Review on Computer Aided Detection and Diagnosis

of lung cancer nodules,” Int. J. Comput. Technol., vol. 3, no. 3, pp. 393–400, 2012.

[17] M. Mabrouk, A. Karrar, and A. Sharawy, “Computer Aided Detection of Large Lung

Nodules using Chest Computer Tomography Images,” Computer (Long. Beach. Calif).,

vol. 3, no. 9, pp. 12–18, 2012.

[18] W. J. Choi and T. S. Choi, “Automated pulmonary nodule detection system in computed

tomography images: A hierarchical block classification approach,” Entropy, vol. 15, no.

2, pp. 507–523, 2013

[19] D. J. Brenner and E. J. Hall, “Computed Tomography — An Increasing Source of

Radiation Exposure,” N. Engl. J. Med., vol. 357, no. 22, pp. 2277–2284, Nov. 2007.

[20] S. Diederich, M. Lentschig, T. Overbeck, D. Wormanns, and W. Heindel, “Detection of

pulmonary nodules at spiral CT: comparison of maximum intensity projection sliding

slabs and single-image reporting,” Eur. Radiol., vol. 11, no. 8, pp. 1345–1350, Aug.

2001.

[21] K. Doi, “Computer-aided diagnosis in medical imaging: historical review, current status

and future potential.,” Comput. Med. Imaging Graph., vol. 31, no. 4–5, pp. 198–211,

2007.

[22] S. Zhou, Y. Cheng, and S. Tamura, “Automated lung segmentation and smoothing

techniques for inclusion of juxtapleural nodules and pulmonary vessels on chest CT

111

images,” Biomed. Signal Process. Control, vol. 13, pp. 62–70, Sep. 2014.

[23] “R2 ImageChecker CAD - View all - Medical Imaging - Christie InnoMed.” [Online].

Available: http://www.christieinnomed.com/en/r2-imagechecker-cad. [Accessed: 27-

Jul-2017].

[24] “Syngo LungCARE CT and syngo Lung CAD1.” [Online]. Available:

https://www.healthcare.siemens.com/computed-tomography/options-upgrades/clinical-

applications/syngo-lungcare-ct-and-syngo-lung-cad. [Accessed: 27-Jul-2017].

[25] “Veolity - a brand of MeVis Medical Solutions AG: Home.” [Online]. Available:

http://www.veolity.com/. [Accessed: 27-Jul-2017].

[26] S. Schalekamp, B. van Ginneken, B. Heggelman, M. Imhof-Tas, I. Somers, M. Brink,

M. Spee, C. Schaefer-Prokop, and N. Karssemeijer, “New methods for using computer-

aided detection information for the detection of lung nodules on chest radiographs,” Br.

J. Radiol., vol. 87, no. 1036, p. 20140015, Apr. 2014.

[27] S. G. Armato, G. McLennan, D. Hawkins, L. Bidaut, M. F. McNitt-Gray, C. R. Meyer,

A. P. Reeves, B. Zhao, D. R. Aberle, C. I. Henschke, E. A. Hoffman, E. A. Kazerooni,

H. MacMahon, E. J. R. Van Beeke, D. Yankelevitz, A. M. Biancardi, P. H. Bland, M.

S. Brown, R. M. Engelmann, G. E. Laderach, D. Max, R. C. Pais, D. P. Y. Qing, R. Y.

Roberts, A. R. Smith, A. Starkey, P. Batrah, P. Caligiuri, A. Farooqi, G. W. Gladish, C.

M. Jude, R. F. Munden, I. Petkovska, L. E. Quint, L. H. Schwartz, B. Sundaram, L. E.

Dodd, C. Fenimore, D. Gur, N. Petrick, J. Freymann, J. Kirby, B. Hughes, A. Vande

Casteele, S. Gupte, M. Sallamm, M. D. Heath, M. H. Kuhn, E. Dharaiya, R. Burns, D.

S. Fryd, M. Salganicoff, V. Anand, U. Shreter, S. Vastagh, B. Y. Croft, and L. P. Clarke,

“The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative

112

(IDRI): a completed reference database of lung nodules on CT scans.,” Med. Phys., vol.

38, no. 2, pp. 915–931, 2011.

[28] C. I. Henschke, D. I. McCauley, D. F. Yankelevitz, D. P. Naidich, G. McGuinness, O.

S. Miettinen, D. Libby, M. Pasmantier, J. Koizumi, N. Altorki, and J. P. Smith, “Early

lung cancer action project: a summary of the findings on baseline screening.,”

Oncologist, vol. 6, no. 2, pp. 147–52, 2001.

[29] Public Lung Image database to address drug response. Vision and Image Analysis Group

(VIA) and International Early Lung Cancer Action Program (I-ELCAP) Labs, Cornell

University. http://www.via.cornell.edu/crpf.html; 2008 [accessed 24-04-16].

[30] W.J. Choi and T.S. Choi, “Genetic programming-based feature transform and

classification for the automatic detection of pulmonary nodules on computed

tomography images,” Inf. Sci. (Ny)., vol. 212, pp. 57–78, 2012.

[31] J. Dehmeshki, X. Ye, X. Lin, M. Valdivieso, and H. Amin, “Automated detection of

lung nodules in CT images using shape-based genetic algorithm,” Comput. Med.

Imaging Graph., vol. 31, no. 6, pp. 408–417, 2007.

[32] J.J. Suárez-Cuenca, P. G. Tahoces, M. Souto, et al., “Application of the iris filter for

automatic detection of pulmonary nodules on computed tomography images,” Comput.

Biol. Med., vol. 39, no. 10, pp. 921–933, 2009.

[33] X. Ye, X. Lin, J. Dehmeshki, et al., “Shape based computer-aided detection of lung

nodules in thoracic CT images,” IEEE Trans. Biomed. Eng. vol. 56, no. 7, pp. 1810–

1820, 2009.

[34] I. Sluimer, M. Prokop, and B. van Ginneken, “Toward automated segmentation of the

pathological lung in CT,” IEEE Trans. Med. Imaging., vol. 24, no. 8, pp. 1025–1038,

113

2005.

[35] G. De Nunzio, E. Tommasi, A. Agrusti, et al., “Automatic lung segmentation in CT

images with accurate handling of the hilar region,” J. Digit. Imaging, vol. 24, no. 1, pp.

11–27, 2011.

[36] A. M. Ali and A. A. Farag, “Automatic Lung Segmentation of Volumetric Low-Dose

CT Scans Using Graph Cuts,” in Advances in Visual Computing: 4th International

Symposium, ISVC 2008, Las Vegas, NV, USA, December 1-3, 2008. Proceedings, Part

I, G. Bebis, R. Boyle, B. Parvin, D. Koracin, P. Remagnino, F. Porikli, J. Peters, J.

Klosowski, L. Arns, Y. K. Chun, T.-M. Rhyne, and L. Monroe, Eds. Berlin, Heidelberg:

Springer Berlin Heidelberg, 2008, pp. 258–267.

[37] E. van Rikxoort, B. de Hoop, and M. Viergever, “Automatic lung segmentation from

thoracic computed tomography scans using a hybrid approach with error detection,”

Med. Phys., vol. 36, no. 7, pp. 2934, 2009.

[38] D.S. Paik, C. F. Beaulieu, G. D. Rubin, et al., “Surface normal overlap: a computer-

aided detection algorithm with application to colonic polyps and lung nodules in helical

CT,” Med. Imaging, IEEE Trans., vol. 23, no. 6, pp. 661–675, 2004.

[39] A. Besbes and N. Paragios, “Landmark-based segmentation of lungs while handling

partial correspondences using sparse graph-based priors,” in Proceedings of the

International Symposium on Biomedical Imaging (ISBI ’11), 2011, pp. 989–995.

[40] M. Sofka, J. Wetzl, N. Birkbeck et al., “Multi-stage learning for robust lung segmentation

in challenging CT volumes,” in Proceedings of the International Conference on

114

Medical 27 Imaging Computing and Computer-Assisted Intervention (MICCAI ’11),

2011, pp. 667–674.

[41] S. Sun, C. Bauer, and R. Beichel, “Automated 3-D segmentation of lungs with lung

cancer in CT data using a novel robust active shape model approach,” IEEE

Transactions on Medical Imaging, vol. 31, no. 2, pp. 449–460, 2012.

[42] A. Mansoor, U. Bagci, Z. Xu, B. Foster, K. N. Olivier, J. M. Elinoff, A. F. Suffredini, J.

K. Udupa, and D. J. Mollura, “A Generic Approach to Pathological Lung

Segmentation,” IEEE Trans. Med. Imaging, vol. 33, no. 12, pp. 2293–2310, Dec. 2014.

[43] S. Dai, K. Lu, J. Dong, Y. Zhang, and Y. Chen, “A novel approach of lung segmentation

on chest CT images using graph cuts,” Neurocomputing, vol. 168, pp. 799–807, Nov.

2015.

[44] A. Soliman, F. Khalifa, A. Elnakib, M. Abou El-Ghar, N. Dunlap, B. Wang, G.

Gimel’farb, R. Keynton, and A. El-Baz, “Accurate Lungs Segmentation on CT Chest

Images by Adaptive Appearance-Guided Shape Modeling,” IEEE Trans. Med. Imaging,

vol. 36, no. 1, pp. 263–276, Jan. 2017.

[45] A. M. Mendonca, J. A. da Silva, and A. Campilho, “Automatic delimitation of lung fields

on chest radiographs,” in proceedings of the International Symposium on Biomedical

Imaging (ISBI ’04), vol. 2, 2004, pp. 1287–1290.

[46] Y. Yim, H. Hong, and Y. G. Shin, “Hybrid lung segmentation in chest CT images for

computer-aided diagnosis,” in 7th International Workshop on Enterprise Networking

and Computing in Healthcare Industry, HEALTHCOM2005, June 2005, pp. 378–383.

115

[47] P. Campadelli, E. Casiraghi, and D. Artioli, “A fully automated method for lung nodule

detection from postero-anterior chest radiographs,” IEEE Transactions on Medical

Imaging, vol. 25, no. 12, pp. 1588–1603, 2006.

[48] P. Korfiatis, S. Skiadopoulos, P. Sakellaropoulos, C. Kalogeropoulou,and L. Costaridou,

“Combining 2D wavelet edge highlighting and 3D thresholding for lung segmentation

in thin-slice CT,” British Journal of Radiology, vol. 80, no. 960, pp. 996–1005, 2007.

[49] S. Hu, E. A. Hoffman, and J. M. Reinhardt, “Automatic lung segmentation for accurate

quantitation of volumetric X-ray CT images,” IEEE Transactions on Medical Imaging,

vol. 20, no. 6, pp. 490–498, 2001.

[50] Q. Gao, S. Wang, D. Zhao, and J. Liu, “Accurate lung segmentation for X-ray CT

images,” in Proceedings of the 3rd International Conference on Natural Computation

(ICNC ’07), vol. 2, 2007, pp. 275– 279.

[51] Z. Shi, J. Ma, M. Zhao, Y. Liu, Y. Feng, M. Zhang, L. He, and K. Suzuki, “Many Is

Better Than One: An Integration of Multiple Simple Strategies for Accurate Lung

Segmentation in CT Images,” Biomed Res. Int., vol. 2016, pp. 1–13, Aug. 2016.

[52] Y. Shi, F. Qi, Z. Xue et al., “Segmenting lung fields in serial chest radiographs using

both population-based and patient-specific shape statistics,” IEEE Transactions on

Medical Imaging, vol. 27, no. 4, pp. 481–494, 2008.

116

[53] A. El-Baz, G. Gimel’farb, R. Falk, M. Abou El-Ghar, T. Holland, and T. Shafer, “A

new stochastic framework for accurate lung segmentation,” in Proceedings of the

International Conference on Medical Imaging Computing and Computer-Assisted

Intervention (MICCAI ’08), New York, NY, USA, September 2008, pp. 322–330.

[54] P. Annangi, S. Thiruvenkadam, A. Raja, H. Xu, X. Sun, and L. Mao, “Region based

active contour method for x-ray lung segmentation using prior shape and low level

features,” in Proceedings of the 7th IEEE International Symposium on Biomedical

Imaging: from Nano to Macro (ISBI ’10), April 2010, pp. 892–895.

[55] P. P. Rebouças Filho, P. C. Cortez, A. C. da Silva Barros, V. H. C. Albuquerque, and J.

M. R. S. Tavares, “Novel and powerful 3D adaptive crisp active contour method applied

in the segmentation of CT lung images,” Med. Image Anal., vol. 35, pp. 503–516, Jan.

2017.

[56] A. El-Baz, G. M. Beache, G. Gimel’Farb, K. Suzuki, K. Okada, A. Elnakib, A. Soliman,

and B. Abdollahi, “Computer-aided diagnosis systems for lung cancer: Challenges and

methodologies,” Int. J. Biomed. Imaging, vol. 2013, 2013.

[57] S. Akram, M. Y. Javed, M. U. Akram, U. Qamar, and A. Hassan, “Pulmonary Nodules

Detection and Classification Using Hybrid Features from Computerized Tomographic

Images,” J. Med. Imaging Heal. Informatics, vol. 6, no. 1, pp. 252–259, Feb. 2016.

[58] J. P. Ko and M. Betke, “Chest CT: automated nodule detection and assessment of change

over time—preliminary experience,” Radiology, vol. 218, no. 1, pp. 267–273, 2001.

117

[59] B. Zhao, M. S. Ginsberg, R. A. Lefkowitz, L. Jiang, C. Cooper, and L.H. Schwartz,

“Application of the LDM algorithm to identify small lung nodules on low dose MSCT

scans,” in Proceedings of the Progress in Biomedical Optics and Imaging—Medical

Imaging 2004: Imaging Processing, February 2004, pp. 818–823.

[60] L. Gonçalves, J. Novo, and A. Campilho, “Hessian based approaches for 3D lung nodule

segmentation,” Expert Syst. Appl., vol. 61, pp. 1–15, Nov. 2016.

[61] B. Chen, T. Kitasaka, H. Honma, H. Takabatake, M. Mori, H. Natori, and K. Mori,

“Automatic segmentation of pulmonary blood vessels and nodules based on local

intensity structure analysis and surface propagation in 3D chest CT images,” Int. J.

Comput. Assist. Radiol. Surg., vol. 7, no. 3, pp. 465–482, 2012.

[62] Q. Li, K. Doi, New selective nodule enhancement filter and its application for significant

improvement of nodule detection on computed tomography, in: Medical Imaging2004:

Image Processing, February 14, San Diego, CA, USA,2004, pp. 1–9.

[63] Hasanabadi, Hosien, Mohsen Zabihi, and Qazaleh Mirsharif. "Detection of pulmonary

nodules in CT images using template matching and neural classifier." Journal of

Advances in Computer Research, vol. 5, no. 1 pp. 19-28, 2014.

[64] R. Wiemker, P. Rogalla, A. Zwartkruis, and T. Blaffert, “Computer aided lung nodule

detection on high resolution CT data,” in Medical Imaging: Image Processing, vol.

4684 of Proceedings of SPIE, February 2002, pp. 677–688.

[65] Y. Lee, T. Hara, H. Fujita, S. Itoh, and T. Ishigaki, “Automated detection of pulmonary

nodules in helical CT images based on an improved template-matching technique,”

IEEE Transactions on Medical Imaging, vol. 20, no. 7, pp. 595–604, 2001.

118

[66] A. El-Baz, A. Elnakib, M. Abou El-Ghar, G. Gimel’Farb, R. Falk, and A. Farag,

“Automatic detection of 2D and 3D lung nodules in chest spiral CT scans,” Int. J.

Biomed. Imaging, vol. 2013, 2013.

[67] D. Cascio, R. Magro, F. Fauci, M. Iacomi, G. Raso, Automatic detection of lung nodules

in CT datasets based on stable 3D mass-spring models, Comput. Biol. Med., vol. 42 no.

11, pp. 1098–1109, 2012.

[68] S. Soltaninejad, M. Keshani, F. Tajeripour, Lung nodule detection by KNN classifier

and active contour modelling and 3D visualization, in: The 16th CSI International

Symposium on Artificial Intelligence and Signal Processing (AISP 2012), IEEE, May

2–3, Shiraz, Fars, Iran, 2012, pp.440–445.

[69] J. Jiantao Pu, D. S. Paik, X. Xin Meng, J. Roos, and G. D. Rubin, “Shape ‘Break-and-

Repair’ Strategy and Its Application to Automated Medical Image Segmentation,”

IEEE Trans. Vis. Comput. Graph., vol. 17, no. 1, pp. 115–124, Jan. 2011.

[70] T. Kubota, A. K. Jerebko, M. Dewan, M. Salganicoff, and A. Krishnan, “Segmentation

of pulmonary nodules of various densities with morphological approaches and

convexity models,” Med. Image Anal., vol. 15, no. 1, pp. 133–154, 2011.

[71] A. Riccardi, T.S. Petkov, G. Ferri, M. Masotti, R. Campanini, Computer-aided detection

of lung nodules via 3D fast radial transform, scale space representation, and Zernike

MIP classification, Med. Phys., vol. 38, no. 4, pp. 1962–1971, 2011.

[72] S. Taghavi Namin, H. Abrishami Moghaddam, R. Jafari, M. Esmaeil-Zadeh, M. Gity,

Automated detection and classification of pulmonary nodules in 3D thoracic CT

images, in: IEEE International Conference on Systems, Man and Cybernetics, IEEE,

October 10–13, Istanbul, Turkey, 2010, pp. 3774–3779.

119

[73] K. Murphy, A. Schilham, H. Gietema, M. Prokop, B. van Ginneken, Automated detection

of pulmonary nodules from low-dose computed tomography scans using atwo-stage

classification system based on local image features, in: Medical Imaging 2007:

Computer-Aided Diagnosis, International Society for Optics and Photonics, February

17, San Diego, CA, USA, 2007, pp.651410-1–651410-12.

[74] S. Ozekes, O. Osman, and O. N. Ucan, “Nodule detection in a lung region that’s

segmented with using genetic cellular neural networks and 3D template matching with

fuzzy rule based thresholding,” Korean J. Radiol., vol. 9, no. 1, pp. 1–9, 2008.

[75] Z. Ge, B. Sahiner, H.-P. Chan, L.M. Hadjiiski, P.N. Cascade, N. Bogot, E.A. Kazerooni,

J. Wei, C. Zhou, Computer-aided detection of lung nodules: False positive reduction

using a 3D gradient field method and 3D ellipsoid fitting, Med. Phys. vol. 32, no. 8, pp.

2443–2454, 2005.

[76] P. R. S. Mendonca, R. Bhotika, S. A. Sirohey, W. D. Turner, J. V. Miller, and R. S.

Avila, “Model-based analysis of local shape for lesion detection in CT scans,” in

Proceedings of the International Conference on Medical Imaging Computing and

Computer- Assisted Intervention (MICCAI ’05), vol. 8, 2005, pp. 688–695.

[77] S. Chang, H. Emoto, D. N. Metaxas, and L. Axe, “Pulmonary micronodule detection

from 3D chest CT,” in Proceedings of the International Conference Medical Imaging

Computing and Computer-Assisted Intervention (MICCAI ’04), vol. 3217, 2004, pp.

821– 828.

[78] H. Takizawa, K. Shigemoto, S. Yamamoto et al., “A recognition method of lung nodule

shadows in X-Ray CT images using 3D object models,” International Journal of Image

and Graphics, vol. 3, no. 4, pp. 533–545, 2003.

120

[79] G. Agam, S. Armato, Vessel tree reconstruction in thoracic CT scans with application to

nodule detection, IEEE Trans. Med. Imaging, vol. 24, no. 4, pp. 486–499, 2005.

[80] K. Awai, K. Murao, A. Ozawa et al., “Pulmonary nodules at chest CT: effect of computer-

aided diagnosis on radiologists’ detection performance,” Radiology, vol. 230, no. 2, pp.

347–352, 2004.

[81] C. I. Fetita, F. Prteux, C. Beigelman-Aubry, and P. Grenier, “3D automated lung nodule

segmentation in HRCT,” in Proceedings of the International Conference Medical

Imaging Computing and Computer-Assisted Intervention (MICCAI ’03), vol. 2878,

2003, pp. 626–634.

[82] M. Tanino, H. Takizawa, S. Yamamoto, T. Matsumoto, Y. Tateno, and T. Iinuma, “A

detection method of ground glass opacities in chest X-ray CT images using automatic

clustering techniques,” in Medical Imaging: Image Processing, vol. 5032 of

Proceedings of SPIE, February 2003, pp. 1728–1737.

[83] T. Ezoe, H. Takizawa, S. Yamamoto et al., “An automatic detection method of lung

cancers including ground glass opacities from chest X-ray CT images,” in Medical

Imaging: Image Processing, vol. 4684 of Proceedings of SPIE, February 2002, pp.

1672–1680.

[84] S. Saita, T. Oda, M. Kubo et al., “Nodule detection algorithm based on multi-slice CT

images for lung cancer screening,” in Medical Imaging: Imaging Processing,


[85] T. Oda, M. Kubo, Y. Kawata et al., “A detection algorithm of lung cancer candidate

nodules on multi-slice CT images,” in Medical Imaging: Image Processing, vol. 5370

of Proceedings of SPIE, February 2002, pp. 1354–1361.

121

[86] N. Yamada, M. Kubo, Y. Kawata et al., “ROI extraction of chest CT images using

adaptive opening filter,” in Medical Imaging: Image Processing, vol. 5032 of


[87] M. N. Gurcan, B. Sahiner, N. Petrick et al., “Lung nodule detection on thoracic computed

tomography images: preliminary evaluation of a computer-aided diagnosis system,”

Medical Physics, vol. 29, no. 11, pp. 2552–2558, 2002.

[88] M. Kubo, K. Kubota, N. Yamada et al., “A CAD system for lung cancer based on low

dose single-slice CT image,” in Medical Imaging: Image Processing, vol. 4684 of


[89] Y. Mekada, T. Kusanagi, Y. Hayase, K. Mori, J.-I. Hasegawa,J.-I. Toriwaki, M. Mori,

H. Natori, Detection of small nodules from 3D chest X-ray CT images based on shape

features, Int. Congr. Ser., vol. 1256, pp. 971–976, 2003.

[90] M. S. Brown, M. F. McNitt-Gray, J. G. Goldin, R. D. Suh, J. W. Sayre, and D. R. Aberle,

“Patient-specific models for lung nodule detection and surveillance in CT images,”

IEEE Transactions on Medical Imaging, vol. 20, no. 12, pp. 1242–1250, 2001.

[91] Y. Kawata, N. Niki, H. Ohmatsu et al., “Computer aided diagnosis of pulmonary nodules

using three-dimensional thoracic CT images,” in Proceedings of the International

Conference Medical Imaging Computing and Computer-Assisted Intervention

(MICCAI’01), vol. 2208, 2001, pp. 1393–1394.

[92] T. Messay, R. C. Hardie, S. K. Rogers, A. Ekin, V. Romano, T. Bülow, N. Bogot, C.

Zhou, A. Chughtai, C. Poopat, and et al., “A new computationally efficient CAD system

for pulmonary nodule detection in CT imagery.,” Med. Image Anal., vol. 14, no. 3, pp.

390–406, Jun. 2010.

122

[93] S. L. A. Lee, A. Z. Kouzani, and E. J. Hu, “Random forest based lung nodule

classification aided by clustering,” Comput. Med. Imaging Graph., vol. 34, no. 7, pp.

535–542, 2010.

[94] M. Niemeijer, M. Loog, M. D. Abramoff, M. A. Viergever, M. Prokop, and B. van

Ginneken, “On Combining Computer-Aided Detection Systems,” IEEE Trans. Med.

Imaging, vol. 30, no. 2, pp. 215–223, Feb. 2011.

[95] P. G. Espejo, S. Ventura, and F. Herrera, “A Survey on the Application of Genetic

Programming to Classification,” Ieee Trans. Syst. Man, Cybern. Part C Appl. Rev., vol.

40, no. 2, pp. 121–144, 2010.

[96] S. C. B. Lo, H. Li, Y. Wang, L. Kinnard, and M. T. Freedman, “A multiple circular path

convolution neural network system for detection of mammographic masses,” IEEE

Transactions on Medical Imaging, vol. 21, no. 2, pp. 150–158, 2002.

[97] K. Suzuki, “A supervised ’lesion-enhancement’ filter by use of a massive-training

artificial neural network (MTANN) in computer- aided diagnosis (CAD),” Physics in

Medicine and Biology, vol. 54, no. 18, pp. S31–S45, 2009.

[98] K. Sirinukunwattana, S. E. A. Raza, Y.-W. Tsang, D. R. J. Snead, I. A. Cree, and N. M.

Rajpoot, “Locality Sensitive Deep Learning for Detection and Classification of Nuclei

in Routine Colon Cancer Histology Images,” IEEE Trans. Med. Imaging, vol. 35, no.

5, pp. 1196–1206, May 2016.

[99] K. Murphy, B. van Ginneken, A. M. R. Schilham, B. J. de Hoop, H. A. Gietema, and M.

Prokop, “A large-scale evaluation of automatic pulmonary nodule detection in chest

CT using local image features and k-nearest-neighbour classification,” Med. Image

Anal., vol. 13, no. 5, pp. 757–770, 2009.

123

[100] W. Guo, Y. Wei, H. Zhou, D. Xue, W. Wei Guo, Y. Ying Wei, H. Hanxun Zhou, and

D. DingYe Xue, An adaptive lung nodule detection algorithm. Chinese Control

Decision Conference IEEE, 2009, pp. 2361–2365.

[101] Y. Liu, J. Yang, D. Zhao, and J. Liu, Computer aided detection of lung nodules based

on voxel analysis utilizing support vector machines. FBIE 2009 - 2009 Int. Conf. Futur.

Biomed. Inf. Eng., 2009, pp. 90–93.

[102] A. Retico, M. E. E. Fantacci, I. Gori, P. Kasae, B. Golosio, A. Piccioli, P. Cerello, G.

De Nunzio, and S. Tangaro, Pleural nodule identification in low-dose and thin-slice

lung computed tomography. Comput. Biol. Med., vol. 39, no. 12, pp. 1137–1144, 2009.

[103] S. Ozekes and O. Osman, Computerized lung nodule detection using 3D Feature

extraction and learning based algorithms. J. Med. Syst., vol. 34, no. 2, pp. 185–194

2010.

[104] J. R. F. D. S. Sousa, A. C. Silva, A. C. de Paiva, and R. A. Nunes, Methodology for

automatic detection of lung nodules in computerized tomography images. Comput.

Methods Programs Biomed., vol. 98, no.1, pp. 1–14, 2010.

[105] M. Assefa, I. Faye, A. S. Malik, and M. Shoaib, Lung nodule detection using multi-

resolution analysis. 2013 ICME Int. Conf. Complex Med. Eng., 2013, pp. 457–461.

[106] A. Tariq, M. U. Akram, and M. Y. Javed, “Lung nodule detection in CT images using

neuro fuzzy classifier,” 2013 Fourth Int. Work. Comput. Intell. Med. Imaging, 2013,

pp. 49–53.

[107] H. M. Orozco, O. O. V. Villegas, H. D. J. O. Dominguez, and V. G. C. Sanchez, Lung

Nodule Classification in CT Thorax Images Using Support Vector Machines. 2013 12th

Mex. Int. Conf. Artif. Intell., 2013, pp. 277–283.

124

[108] A. Tartar, N. Kilic, and A. Akan, Classification of pulmonary nodules by using hybrid

features. Comput. Math. Methods Med., vol. 2013, pp.1–11, 2013.

[109] A. Teramoto, H. Fujita, K. Takahashi, O. Yamamuro, T. Tamaki, M. Nishio, and T.

Kobayashi, Hybrid method for the detection of pulmonary nodules using positron

emission tomography/computed tomography: A preliminary study. Int. J. Comput.

Assist. Radiol. Surg., vol. 9, no. 1, pp. 59–69, 2014.

[110] B. van Ginneken, A. A. A. Setio, C. Jacobs, and F. Ciompi, “Off-the-shelf convolutional

neural network features for pulmonary nodule detection in computed tomography

scans,” in 2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI),

2015, pp. 286–289.

[111] A. A. A. Setio, F. Ciompi, G. Litjens, P. Gerke, C. Jacobs, S. J. van Riel, M. M. W.

Wille, M. Naqibullah, C. I. Sanchez, and B. van Ginneken, “Pulmonary Nodule

Detection in CT Images: False Positive Reduction Using Multi-View Convolutional

Networks,” IEEE Trans. Med. Imaging, vol. 35, no. 5, pp. 1160–1169, May 2016.

[112] R. Anirudh, J. J. Thiagarajan, T. Bremer, and H. Kim, “Lung nodule detection using

3D convolutional neural networks trained on weakly labeled data,” Medical Imaging

2016: Computer-Aided Diagnosis 2016, vol. 9785, p. 978532.

[113] C. Jacobs, E. M. van Rikxoort, K. Murphy, M. Prokop, C. M. Schaefer-Prokop, and B.

van Ginneken, “Computer-aided detection of pulmonary nodules: a comparative study

using the public LIDC/IDRI database,” Eur. Radiol., vol. 26, no. 7, pp. 2139–2147, Jul.

2016.

125

[114] J. Ding, A. Li, Z. Hu, and L. Wang, “Accurate Pulmonary Nodule Detection in

Computed Tomography Images Using Deep Convolutional Neural Networks,” Jun.

2017, arXiv preprintarXiv:1706.04303.

[115] A. A. A. Setio, A. Traverso, T. de Bel, M. S. N. Berens, C. van den Bogaard, P. Cerello,

H. Chen, Q. Dou, M. E. Fantacci, B. Geurts, R. van der Gugten, P. A. Heng, B. Jansen,

M. M. J. de Kaste, V. Kotov, J. Y.-H. Lin, J. T. M. C. Manders, A. Sóñora-Mengana, J.

C. García-Naranjo, E. Papavasileiou, M. Prokop, M. Saletta, C. M. Schaefer-Prokop, E.

T. Scholten, L. Scholten, M. M. Snoeren, E. L. Torres, J. Vandemeulebroucke, N.

Walasek, G. C. A. Zuidhof, B. van Ginneken, and C. Jacobs, “Validation, comparison,

and combination of algorithms for automatic detection of pulmonary nodules in

computed tomography images: The LUNA16 challenge.,” Med. Image Anal., vol. 42,

pp. 1–13, Dec. 2017.

[116] W. Zhu, C. Liu, W. Fan, and X. Xie, “DeepLung: 3D Deep Convolutional Nets for

Automated Pulmonary Nodule Detection and Classification,” Sep. 2017, arXiv preprint

arXiv:1709.05538.

[117] T. Huang, G. Yang, and G. Tang, “A fast two-dimensional median filtering algorithm,”

IEEE Trans. Acoust., vol. 27, no. 1, pp. 13–18, Feb. 1979.

[118] S. G. Armato, M. L. Giger, C. J. Moran, J. T. Blackburn, K. Doi, and H. MacMahon,

“Computerized detection of pulmonary nodules on CT scans,” Radiographics, vol. 19,

no. 5, pp. 1303– 1311, 1999.

[119] Z. Shi, M. Zhao, Y. Wang, L. He, K. Suzuki, C. Jin, and M. Zhang, “Hessian-log: A

novel dot enhancement filter,” ICIC Express Lett. Part B Appl., vol. 6, no. 8, pp. 1987–

1992, 2012.

126

[120] R. C. Gonzalez and R. E. (Richard E. Woods, Digital image processing. Prentice Hall,

2008, ISBN: 9780131687288.

[121] Q. Li, S. Sone, and K. Doi, “Selective enhancement filters for nodules, vessels, and

airway walls in two- and three-dimensional CT scans.,” Med. Phys., vol. 30, no. 8, pp.

2040–2051, 2003.

[122] L. Shapiro and G. Stockman, “Computer Vision,” Prentice Hall, 2001, p. 580, ISBN:

9780130307965.

[123] Y. Yu and H. Zhao, “Enhancement Filter for Computer-Aided Detection of Pulmonary

Nodules on Thoracic CT images,” Sixth Int. Conf. Intell. Syst. Des. Appl., vol. 2, 2006,

pp. 1200–1205.

[124] S. L. A. Lee, A. Z. Kouzani, and E. J. Hu, “Automated detection of lung nodules in

computed tomography images: A review,” Mach. Vis. Appl., vol. 23, no. 1, pp. 151–

163, 2012.

[125] R. M. Haralick, K. Shanmugam, and I. Dinstein, “Textural Features for Image

Classification,” IEEE Trans. Syst. Man. Cybern., vol. 3, no. 6, pp. 610–621, Nov. 1973.

[126] G. M. Xian, “An identification method of malignant and benign liver tumors from

ultrasonography based on GLCM texture features and fuzzy SVM,” Expert Syst. Appl.,

vol. 37, no. 10, pp. 6737–6741, 2010.

[127] J. L. Rodgers and W. A. Nicewander, “Thirteen Ways to Look at the Correlation

Coefficient,” Am. Stat., vol. 42, no. 1, p. 59, Feb. 1988.

127

[128] B. E. Boser, I. M. Guyon, and V. N. Vapnik, “A training algorithm for optimal margin

classifiers,” in Proceedings of the fifth annual workshop on Computational learning

theory - COLT ’92, 1992, pp. 144–152.

[129] T. Sun, J. Wang, X. Li, P. Lv, F. Liu, Y. Luo, Q. Gao, H. Zhu, and X. Guo,

“Comparative evaluation of support vector machines for computer aided diagnosis of

lung cancer in CT based on a multi-dimensional data set,” Comput. Methods Programs

Biomed., vol. 111, no. 2, pp. 519–524, 2013.

[130] N. S. Altman, “An Introduction to Kernel and Nearest-Neighbor Nonparametric

Regression,” Am. Stat., vol. 46, no. 3, pp. 175–185, Aug. 1992.

[131] J. R. Quinlan, “Simplifying decision trees,” Int. J. Man. Mach. Stud., vol. 27, no. 3, pp.

221–234, Sep. 1987.

[132] K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press, San

Diego, Calif, USA, 2nd edition, 1990, ISBN: 9780122698514.

[133] Chih-Wei Hsu, Chih-Chung Chang, and C.-J. L. “A Practical Guide to Support Vector

Classification,” BJU Int., vol. 101, no. 1, pp. 1396–400, Feb. 2008.

[134] O. Chapelle and A. Zien, “Semi-Supervised Classification by Low Density Separation,”

Biol. Cybern., vol. 2005, pp. 57–64, 2005.

[135] J. A. Swets, “Measuring the accuracy of diagnostic systems.,” Science, vol. 240, no.

4857, pp. 1285–93, Jun. 1988.

[136] Q. Dou, H. Chen, L. Yu, J. Qin, and P.-A. Heng, “Multilevel Contextual 3-D CNNs for

False Positive Reduction in Pulmonary Nodule Detection,” IEEE Trans. Biomed. Eng.,

vol. 64, no. 7, pp. 1558–1567, Jul. 2017.

128

[137] M. Bergtholdt, R. Wiemker, and T. Klinder, “Pulmonary nodule detection using a

cascaded SVM classifier,” Medical imaging, vol. 9785, p. 978513, 2016.

[138] E. L. Torres, E. Fiorina, F. Pennazio, C. Peroni, M. Saletta, N. Camarlinghi, M. E.

Fantacci, and P. Cerello, “Large scale validation of the M5L lung CAD on

heterogeneous CT datasets,” Med. Phys., vol. 42, pp. 1477–1489, 2015.

[139] D. P. Chakraborty, “Maximum likelihood analysis of free-response receiver operating

characteristic (FROC) data,” Med. Phys., vol. 16, no. 4, pp. 561–568, Jul. 1989.

129

APPENDIX A

DICOM ENCODING STRUCTURE

DICOM image has same main body structure like other image formats, which consists of the

file header part (i.e., the overall structure of the file description) and the data sets (i.e., pixel

data). The first part is the file header. The header section also contains two parts, the first is a

size of 128 00H bytes reservation information and the second part is the file type identification

field and the content is "DICM", which indicates that the file is a DICOM image format.

The second part is the dataset section. The dataset section encapsulates several data set

information, each of which is represented by a Group Number (GN), which occupies 2 bytes.

Each dataset itself, is complex in structure and content, and is categorized according to

functional types. The dataset can be divided into Default Dataset, Standard Dataset, and Private

Dataset. Some groups in the Default Dataset must not be omitted for DICOM files, which are

sections 0001 through 0007 and FFFF groups. Such as, 0002 groups in which information such

as the transfer syntax for parsing datasets is used to instruct the program's staff to read the data.

A Standard Dataset is a set of all even-numbered sets after the default dataset is removed, such

as a 0008 set of data that record basic image information, including image types and image

identification numbers and 0010 sets of data that record patient information, such as patient

name and patient age. A Private Dataset is a set of all odd-numbered sets of data except the

default dataset and the FFFF group. Private dataset information is a necessity and such datasets

provide private customization for major hardware device manufacturers.

Because of the complexity of the information types and contents of patient or manufacturer’s

equipment, the datasets are decomposed into several data elements, each of which is

distinguished by Element Number (EN). Briefly, the data element consists of four main parts,

130

namely the tag (Tag), Value representation type VR (Value Represent), the value length VL

(Value Length) and the range VF (Value Field). The value of each section is shown in Tables

A-1 and A-2.

Table A-1: Data elements of explicit VR for type OB, OW, OF, SQ, UT, or UN

Table A-2: Data elements for other types of explicit VRs

(1) Element Tag. The element tag is an unsigned integer data that occupies 4 bytes, the first 2

bytes represent the element's group number and the last 2 bytes represent the element's element

number. The tag can be used as the index number of the data element to quickly query and

pinpoint in the procedure code.

(2) Value Represent (VR). The value representation type is of 2 bytes and indicates the type.

If the transfer syntax is implicitly transmitted, it can be omitted. If the syntax is explicitly

transferred, the contents of the field are the corresponding data element type (the reference data

Element ‘Tag’ Value Representation Type VR

Value

Length

VL

Value Field

VF

Group

number

GN

Element

No. EN

VR

(OB,OW,OF,SQ,UT,UN

)

0000H

(reserved

bits)

VL

Binary

conversion

information

2 Bytes 2 Bytes 2 Bytes 2 Bytes 4 Bytes In accordance

with the VL

Element ‘Tag’

Value Representation Type VR

Value Length

VL

Value field VF

Group

number

GN

Element

No. EN VR (Other) VL

Binary

conversion

information

2 Bytes 2 Bytes 2 Bytes 2 Bytes In accordance

with the VL

131

type description). There are 27 values, such as AS (Age String), CS (Code String), DA (DATE,

Date) and so on.

3) Value length. The value length represents the length of the data element, and accounts for 4

bytes if the VR is explicit, and 2 bytes if it is implicit. At the same time VL must be even bit

length and is padded with the corresponding characters if it is not even, generally using ‘0’ and

if the value indicates that the type is SQ type, the value length is FFFFFFFF special content.

(4) Range of domain. The actual value of the data element is stored in the value field. If the

date is stored, it means that the content is similar to "20000101".

As a medical image format, the pixel data is stored in (7FE0, 0010) data element. According

to the provisions of communication or storage, we can use lossy-compression, lossless

compression or uncompressed way to store. In this thesis, because there is no need of

transmission so we can direct use the uncompressed DICOM image where each pixel occupies

2 bytes where the image bit is 16, 12 or 8 bits, usually 12 bits. The data element structure is

shown in Figure A-1.

Data element Data elementData element ... Data element

Element tag Value Type Value Length Range

Data set

Data element

132

Figure A-1: Structure of data elements

DICOM CT Image Analysis

Our proposed algorithm reads the DICOM image, first obtains the basic file information of the

image and extracts the file judgment information. Then it obtains the image pixel information

and finally resolves it successfully. The whole process is consistent with the reading of other

image information. It is mainly the acquisition and judgment of the image structure information

to finally read the image information successfully. The parsing process is shown in Figure A-

2.

1. Read DICOM image file by loading DICOM image file into computer memory. Skip 128

bytes 00 parts and read the next 4 bytes and convert them to character through ASCII code. If

Start

Analytical Syntax

Analytical image information

Extract pixel information

Convert and display

End

DICOM image or not

Y

N

133

Figure A-2: DICOM Image Resolution Flow Chart

the character is "DICOM" then determine that the file is DICOM format image otherwise it is

not.

2. Read the first part of the base dataset, that is, the group number 0002 to 0007 of the dataset,

which is the basic step for reading the subsequent data set rules. For example, group number

0002 is a data set containing syntax rules, which determines the rules for the subsequent reading

of data length and position. The transfer syntax stored in the data element (0002,0010) is mainly

of three categories, implicit small end storage, explicit small end storage and explicit large end

storage. Table A-3 shows the relationship between the transmission of syntax values and

semantics.

Table A-3: Transfer Syntax Comparison Table

3. Read the second part of the dataset, that is, group number 0008 and onwards of the dataset.

(0008, 0005) denotes specific character set. This “Attribute Specific Character Set” can be

divided into ISO_IR and other character sets. 0010 group contains patient information and

diagnostic information, such as the patient's age, sex, date of birth, and other basic information.

0028 groups contain image information data sets, such as image size, bit width, bit allocation,

window width, window and gray image specifications and other image file information. In the

1.2.840.10008.1.2 Implicit VR Little Endian Transfer Syntax

1.2.840.10008.1.2.1 Explicit VR Little Endian Transfer Syntax

1.2.840.10008.1.2.2 Explicit VR Big Endian Transfer Syntax

134

parsing of image files, the image width and medical image information are very important to

provide the basis for the next stage.

4. Determine whether the image is positive or broken by reading the data element (7FE0, 0010)

position information which is the data element with maximum number of bytes in all the data

elements and as learnt in the previous step that the pixel allocation is of 16 bits in which 12 bits

are valid pixel information. The pixel data is extracted according to the transfer syntax rules

and the complete data part is sorted.

5. After obtaining the complete pixel data part, add the corresponding file header which can be

BMP or JPG and so can read the file. However, this research uses another file mode. Depending

on the grayscale distribution of pixels, 16-bit pixels can be converted to a single pixel of 8 bits.

We can also convert a grayscale value to a CT value which is a parameter used on medical

image. There is a linear relationship between the CT value and the gray value which can be

converted by the slope and intercept information in the DICOM data element. CT value can

distinguish the composition of the tissue such as the lung CT value varies from -1000HU to

+1000HU. The lung parenchyma CT value is generally around -600HU, the air density is -

1000HU, the water CT value is 0HU and the bone CT value is +1000HU.

The DICOM file binary encoding case is shown in Figure A-3 and the DICOM parsing

description is

performed

according to the

instance.

135

Figure A-3: DICOM File Binary Encoding

00H: 128 00H bytes

44 49 43 4D: ASCII code value representing 4 bytes of "DICM".

02 00 10 00: Indicates (0020,0010) data element tag, transfer syntax label.

55 49: Indicates the value of the type VL.

12 00: indicates the hexadecimal number 0X0012, the value is 18.

31 2E 32 2E ... 31 2E 32 00, represents a value of "1.2.840.10008.1.2" (Implicit VR

Little Endian Transfer). According to the label position and coding structure rules, the

above information can be obtained and the information can be sequentially read in

accordance with the above method and finally the image information and the pixel

information conversion can be obtained.

DICOM Images and Other Formats

Based on the standard conformance statement, the DICOM standard specifies and

complements the documentation and actual information necessary for medical imaging as well

as the provision of private data sets for different manufacturers. These datasets are meaningless

136

by themselves and can be defined and used by different device manufacturers. The main

differences are described in the following section.

Complexity of Image Format

DICOM data structure simplifies the file header information with only the basic image

identifier at the beginning of the file. Other information is sorted by group number and element

number which contains various kinds of structured and complete information including image

information, patient information, equipment information and diagnostic information and has

reserved many private dataset definitions and uses for different manufacturers. While other

images have only the basic information of the file and pixel information and their structure is

simple.

Image Parsing Efficiency

DICOM images are located using ‘Group Number’ and ‘Element Number’. There is a specified

indexing mechanism to facilitate the query and access of specific information while in other

image formats, information is positioned using the specified byte position and then the actual

position is calculated by the pointer due to which the query is slow while the information of

the image pixel data is also determined and adjusted.

Uniqueness of the Image

DICOM image is a special image format for medical image diagnosis. The document records

the image information, the patient information, the diagnosis information, the overlay

information, device information and so on. Not versatile, unlike BMP and JPEG formats, can

only be applied to medical diagnostics because it is more specific and more professional.

DICOM image and other commonly used image format structure coding principles are

generally consistent but there are many differences in the structure mechanism and application

areas however the DICOM image is quite significant in medical applications.

137

APPENDIX B

ANALYSIS OF XML INFORMATION OF PULMONARY NODULES

The diagnosis of early lung cancer’s symptoms is quite difficult. It is less likely that a non-

medical person can distinguish the pulmonary nodule symptoms. Even the experienced

radiologist experts may also face the misdiagnosis and the missed diagnosis phenomenon. So,

it is quite obvious that we need a research standard. LIDC data resources not only provide

DICOM sequence images but also provide the diagnosis information of pulmonary nodules in

a two phase annotation process by the four experts radiologists in the form of XML (EXtensible

Markup Language) file to the researchers. Pulmonary nodule XML diagnostic information

includes diagnostic information of lung cancer from four expert radiologists which consists of

case identifier, true and false nodule number, nodules’ contour coordinates information and so

on. Since the system researchers cannot judge the diagnosis of pulmonary nodules and the four

radiological experts have comprehensive and authoritative diagnostic information so the

standard reference information of pulmonary nodules is provided in the XML document which

138

provides a basis for the comparison of the results of the proposed system. The steps to analyze

XML are follows:

1. Load the pulmonary nodule xml file, get the version number date and check the instance

identification number.

2. Getting to the <readingSession> node, the node represents the doctor area, which contains

the radiology expert number, true and false nodular features and contour coordinates

information.

3. Getting to the <readingSession> sub-node radiology expert number node

<servicingRadiologist-ID>, pulmonary nodule node <unblindedReadNodule> and non-

pulmonary nodule node <nonNodule> and other nodes. Record the number of radiological

experts and record the number of pulmonary nodules and non-pulmonary nodules.

4. According to the <unblindedReadNodule> node information, queries to its child nodes

obtains the nodule number <noduleID>, the nodule information <characteristics>, and the

nodule area <roi> where the nodule information <characteristics> contains feature information

such as calcification and the node area <roi> contains image layer coordinates, true and false

nodule identification and contour coordinate information. <nonNodule> non-pulmonary

nodule information to obtain relevant information.

5. Gets the nodal contour coordinates information in the nodule region <roi>, providing data

information for subsequent decisions. The LIDC-XML structure is shown in Figure B-1.

139

Xml file<SeriesInstanceUID>Sequence instance

identification number

<StudyInstanceUID>Check the instance

identification number

<readingSession>Radiologist Diagnostic

Information ①


Information ②


Information ③


Information ④

...

<servicingRadiologistID>Service Radiology Specialist Number

<unblindedReadNodule>Nodule information

<nonNodule>Non-nodule information

<noduleID>Nodule number

<characteristics>

<roi>Nodule contour

<imageZposition>

<imageSOP_UID>

<inclusion>Nodules

identification

<edgeMap>Nodal Contour coordinates

<nonNoduleID>

<imageZposition>

<locus>

<imageSOP_UID>

Figure B-1: Structure of Pulmonary Nodules’ XML

140

ABBREVIATIONS

WHO: World Health Organization

DICOM: Digital Imaging and Communications in Medicine

LDA: Linear Discriminant Analysis

KNN: K-Nearest-Neighbour

SVM: Support Vector Machine

LIDC: Lung Image Database Consortium

ELCAP: Early Lung Cancer Action Program

ROI: Region of Interest

CAD: Computer Aided Detection

CT: Computed Tomography

PET: Positron Emission Tomography

MRI: Magnetic Resonance Imaging

FP: False positives

FP/scan: False Positive Per Scan

HU: Hounsfield Unit

GLCM: Gray Level Cooccurrence Matrix

IDM: Inverse Difference Moment

ROC: Receiver Operating Characteristic

141

FROC: Free-response ROC

TCIA: The Cancer Imaging Archive

LIDC-IDRI: Lung Image Database Consortium- Image Database Resource Initiative

GS: Gold standard.

DSC: Dice similarity coefficient

ACCU: Accuracy

OM: Overlap measure

SEN: Sensitivity

SPEC: Specificity

PPV: Positive Predictive Value

RmsD: Root Mean Square Difference of the Distance Between the Segmentation and the

Ground Truth.

AD: Mean Absolute Surface Distance.

GN: Group Number

EN: Element Number

VR: Value Represent

VL: Value Length

VF: Value Field

AS: Age String

142

CS: Code String

DA: DATE

BMP: Bitmap

JPEG: Joint Photographic Experts Group

ASCII: American Standard Code for Information Interchange

XML: Extensible Markup Language

GGO: Ground Glass Opacity

LOLA11: Lobe and Lung Analysis 2011

MGRF: Markov-Gibbs Random Field

GSS: Gaussian Scale Space

PA: Postero-Anterior

SIFT: Scale Invariant Feature Transform

PCA: Principal Component Analysis

EM: Expectation-Maximization

ACACM: Adaptive Crisp Active Contour Method

RBF: Radial Basis Function

an efficient scheme for lung nodule detection

Documents