an efficient scheme for lung nodule detection
TRANSCRIPT
AN EFFICIENT SCHEME FOR LUNG
NODULE DETECTION
Furqan Shaukat
11-UET/PhD-EE-51
Supervisor
Prof. Dr. Gulistan Raja
DEPARTMENT OF ELECTRICAL ENGINEERING
FACULTY OF ELECTRONICS & ELECTRICAL ENGINEERING
UNIVERSITY OF ENGINEERING AND TECHNOLOGY
TAXILA
April 2018
i
AN EFFICIENT SCHEME FOR LUNG NODULE
DETECTION
Author
Furqan Shaukat
11-UET/PhD-EE-51
A thesis submitted in partial fulfillment of the requirements for the degree of
Ph.D. Electrical Engineering
Thesis Supervisor:
Prof. Dr. Gulistan Raja
Electrical Engineering Department
UET Taxila
DEPARTMENT OF ELECTRICAL ENGINEERING
FACULTY OF ELECTRONICS & ELECTRICAL ENGINEERING
UNIVERSITY OF ENGINEERING AND TECHNOLOGY, TAXILA
April 2018
ii
DECLARATION
I certify that research work titled “An Efficient Scheme for Lung Nodule Detection” is my own
work. The work has not been presented elsewhere for assessment. Where material has been used
from other sources it has been properly acknowledged / referred.
Signature of Student
Furqan Shaukat
11-UET/PhD-EE-51
iii
DEDICATION
…to the loving memories of my Father who always had belief in me.
iv
ACKNOWLEDGEMENTS
First of all, I am very thankful to Almighty ALLAH who has given me the strength and courage
to work on the thesis.
Special thanks to my supervisor Prof. Dr. Gulistan Raja for his guidance and technical support
in the development of the thesis. It has been a long journey of relationship starting from my
master degree and I learnt from him the art of being committed to the task and being
professional. He has been very supportive and kind throughout this thesis.
I would also like to thank Prof. Alejandro Frangi for his continuous support and help during
my stay at CISTIB, Department of Electronic and Electrical Engineering, University of
Sheffield. It has been my privilege to work with him.
Last but not the least; I also respect the patience of my family who has suffered a lot during my
work on the thesis.
Furqan Shaukat
v
TABLE OF CONTENTS
DECLARATION ....................................................................................................................... ii
ACKNOWLEDGEMENTS ...................................................................................................... iv
TABLE OF CONTENTS ........................................................................................................... v
EXECUTIVE SUMMARY ....................................................................................................... 1
Chapter 1: INTRODUCTION.................................................................................................... 6
1.1 Research Background and Significance ........................................................................... 6
1.2 Lung Cancer and Nodules ................................................................................................ 8
1.2.1 Imaging Features and Analysis of Pulmonary Nodules CT ...................................... 8
1.3 Computer Aided Detection............................................................................................. 11
1.4 Organization of Thesis ................................................................................................... 13
Chapter 2: LITERATURE REVIEW....................................................................................... 16
2.1 Lung Segmentation ........................................................................................................ 17
2.1.1 Shape-Based Techniques ......................................................................................... 17
2.1.2. Edge Based Techniques .......................................................................................... 20
2.1.3 Thresholding Based Techniques .............................................................................. 22
2.1.4 Deformable Boundary Techniques .......................................................................... 23
2.2 Lung Nodule Detection .................................................................................................. 27
2.3 False Positive Reduction ................................................................................................ 31
2.4 Problem Statement ......................................................................................................... 40
Chapter 3: PROPOSED SCHEME FOR LUNG NODULE DETECTION ............................ 41
3.1 Lung Segmentation ........................................................................................................ 42
3.1.1 Lung Image Preprocessing ...................................................................................... 42
3.1.2 Lung Parenchyma Segmentation ............................................................................. 43
3.2 Image Enhancement and Nodule Detection ................................................................... 50
3.2.1 Theoretical Research on Image Enhancement Algorithm ....................................... 51
3.2.2 Multi-Scale Enhancement Algorithm Based on Hessian Matrix ............................. 52
3.3 Lung Nodule Detection and Classification .................................................................... 58
3.3.1 Rule-Based Analysis of Lung Nodule Candidates .................................................. 60
3.3.2 Feature Extraction .................................................................................................... 61
3.4 Classification of pulmonary nodules .............................................................................. 70
3.4.1 Support Vector Machine Classifier ......................................................................... 70
Chapter 4: RESULTS AND DISCUSSION ............................................................................ 76
vi
4.1 Dataset and Hit Criteria .................................................................................................. 76
4.1.1 DICOM Resources .................................................................................................. 77
4.2 Experimental Environment ............................................................................................ 78
4.2.1 Image Preprocessing Module .................................................................................. 78
4.2.2 Lung Segmentation Module .................................................................................... 79
4.2.3 Image Enhancement Module ................................................................................... 80
4.2.4 Lung Nodule Segmentation Module ........................................................................ 86
4.2.5 Lung Nodule Classification Module ........................................................................ 86
4.2.6 Supplementary Functions Module ........................................................................... 86
4.3 Classification Results of SVM with Different Kernel Functions ................................... 86
4.4 Classification Results of SVM with Different Kernel Scale and Penalty Factor ........... 91
4.5 Classification Results of Different Classifiers ............................................................... 93
4.6 Feature Ranking ............................................................................................................. 94
4.7 Comparison with Other Systems .................................................................................... 95
Chapter 5: CONCLUSION AND FUTURE PROSPECT ..................................................... 105
5.1 Conclusion .................................................................................................................... 105
5.2 Follow-up Work and Prospects .................................................................................... 105
REFERENCES ...................................................................................................................... 108
APPENDIX A ...................................................................................................................... 129
APPENDIX B ........................................................................................................................ 137
ABBREVIATIONS ............................................................................................................. 140
vii
LIST OF FIGURES
Figure 1-1: Sample images of four nodule groups (encircled). From left to right, well-
circumscribed, juxta-vascular, juxta-pleural and pleural tail nodules. ...................................... 8
Figure 2-1: Process of lung nodule detection consists of acquiring an image followed by lung
segmentation, nodule detection and false positive reduction or classification. ....................... 16
Figure 3-1: Flow Chart of the Proposed Method ..................................................................... 41
Figure 3-2: Lung Parenchymal Segmentation Flow Chart ...................................................... 44
Figure 3-3: Example images of lung volume segmentation, (a) to (e) from left to right
presenting input, thresholded, hole filled, lung segmented and contour corrected images,
respectively. ............................................................................................................................. 48
Figure 3-4: (a) Represents a parenchymal image (b) represents an image after repair of the lung
parenchyma (c) zoomed view of left and right lung contour separation. ................................ 50
Figure 3-5: Multi-scale circular filter enhancement algorithm flow chart .............................. 53
Figure 3-6: Example images showing results of image enhancement at different slices. (a) and
(b) shows a low-density nodule in red circle, which is detected after image enhancement where
(c) and (d) shows the other two slices after image enhancement. ........................................... 58
Figure 3-7: Examples of detected candidates (a) nodules (b) non-nodules. It can be seen that
nodule diversity and their close resemblance to other anatomic structures present in the lung
region make the task of detection more challenging and produces false positives, which are
being reduced with the aid of a classifier................................................................................. 61
Figure 3-8: Flow Chart of Classification Process .................................................................... 74
Figure 3-9: Pulmonary Nodule Classification Results ............................................................. 75
Figure 4-1: User Interface of Lung CAD ................................................................................. 78
Figure 4-2: Sample Output of Image Processing Module ....................................................... 79
viii
Figure 4-3: Sample Output of Lung Segmentation Module .................................................... 79
Figure 4-4: Sample Output of Image Enhancement Module ................................................... 80
Figure 4-5: Sample Results of Image Preprocessing, Segmentation and Enhancement Modules
on Top, Bottom and Middle Slices of Different Scans. ........................................................... 83
Figure 4-6: Grid Search Results for SVM -Gaussian. ............................................................. 88
Figure 4-7: Grid Search Results for SVM-Cubic .................................................................... 89
Figure 4-8: Grid Search Results for SVM-Quadratic .............................................................. 89
Figure 4-9: ROC curves of the SVM classifier with different kernel function using (a) 2-Fold
Scheme, (b) 5-Fold Scheme (c) 7-Fold Scheme. ..................................................................... 90
Figure 4-10: ROC curves of the SVM classifier with (a) different kernel scale 𝛾 values, varying
from 0.3 to 3 (b) with different penalty parameter c values varying from 1 to 4. ................... 93
Figure 4-11: (a) ROC curves of SVM classifier with different feature classes (b) ROC curves
of different classifiers .............................................................................................................. 95
Figure 4-12: Number of False Negatives with respect to Size ................................................ 98
Figure 4-13: Percentage of False Negatives with respect to Size ............................................ 98
Figure 4-14: Detection Sensitivity with respect to Nodule Size .............................................. 99
Figure 4-15: Comparison of System’s Overall Performance w.r.t. different agreement levels
................................................................................................................................................ 100
Figure 4-16: Sample Missed Nodules indicated by the red arrow (False Negatives) by the
proposed system. Encircled objects in respective figures represent False Positive ............... 100
Figure 4-17: Sample images of detected nodule (highlighted) by the proposed system (True
Positive). The arrow in respective figures indicates the False Positive detected by the system
along with True Positive. ....................................................................................................... 103
Figure 4-18: FROC curves of the proposed system with respect to the different kernel functions
of SVM classifier. .................................................................................................................. 104
ix
Figure A-1: Structure of Data Elements ................................................................................ 132
Figure A-2: DICOM Image Resolution Flow Chart .............................................................. 133
Figure A-3: DICOM File Binary Encoding ........................................................................... 135
Figure B-1: Structure of Pulmonary Nodules’ XML ............................................................. 139
x
LIST OF TABLES
Table 2-1: Review of Lung Segmentation Techniques............................................................ 25
Table 2-2 Review of Lung Nodule Detection Methods ........................................................... 30
Table 2-3: Review of Current CAD Systems. ......................................................................... 38
Table 3-1: Extracted Features of Nodule Candidates. ............................................................. 65
Table 3-2: Feature Correlation Information ............................................................................. 69
Table 4-1: Classification Results of SVM on test dataset with different kernel functions. ..... 87
Table 4-2: Classification Results of SVM-Gaussian on test dataset using different γ values. 91
Table 4-3: Classification Results of SVM-Gaussian on test dataset using different C values 92
Table 4-4: Classification Results of different classifiers on test dataset. ................................ 93
Table 4-5: Classification Results of SVM-Gaussian on test dataset using different feature
classes ..................................................................................................................................... 94
Table 4-6: Performance Comparison of Different CAD Systems. .......................................... 96
Table 4-7: Average Scores of Different Characteristics of Sample False Negatives ............ 102
Table A-1: Data elements of explicit VR for type OB, OW, OF, SQ, UT, or UN ................ 130
Table A-2: Data elements for other types of explicit VRs ..................................................... 130
Table A-3: Transfer Syntax Comparison Table ..................................................................... 133
1
EXECUTIVE SUMMARY
Lung cancer has been one of the major threats to human life for decades in both developed and
under developed countries with the smallest rate of survival after diagnosis. The survival rate
can be increased by early nodule detection. Computer Aided Detection (CAD) can be an
important tool for early lung nodule detection and preventing the deaths caused by the lung
cancer. In this dissertation, we have proposed a novel technique for lung nodule detection using
a hybrid feature set. The proposed method starts with pre-processing, removing any present
noise from input images, followed by lung segmentation using optimal thresholding. Then the
image is enhanced using multi scale dot enhancement filtering prior to nodule detection and
feature extraction. Finally, classification of lung nodules is achieved using Support Vector
Machine (SVM) classifier. The feature set consists of intensity, shape (2D and 3D) and texture
features, which have been selected to optimize the sensitivity and reduce false positives. In
addition to SVM, some other supervised classifiers like K-Nearest-Neighbour (KNN),
Decision Tree and Linear Discriminant Analysis (LDA) have also been used for performance
comparison. The extracted features have also been compared class-vise to determine the most
relevant features for lung nodule detection. The proposed system has been evaluated using 850
scans from Lung Image Database Consortium (LIDC) dataset and k-fold cross validation
scheme. The main research work done in this dissertation is summarized in the following
section.
1. The proposed method starts with the segmentation of lung volume from pre-processed input
CT images. Lung segmentation has a critical importance as it is pre-requisite to the nodule
detection. Any in-accurate lung volume segmentation can lead to the low accuracy of whole
system. In this dissertation, we propose a fully automated segmentation method for lung
volume from CT scan images which consists of series of steps. Initially, the CT image is
2
segmented by using optimal thresholding and the lung volume is obtained using connected
component labeling method and other irrelevant information is removed at this stage. The
resultant image at this stage contains holes which is filled with the hole filling algorithm e.g.
morphological operations. Finally, the lung contour is smoothed by rolling ball algorithm to
include any juxta pleural nodules.
2. After lung segmentation, image enhancement is done to detect the low-density nodules.
Image enhancement plays an important role in detection of these nodules by enhancing them
and reducing false positives by weakening the other structures in lung region. In this thesis, a
multi scale dot enhancement filter is used to detect these low-density nodules which may
remain undetected in the absence of any enhancement algorithm and can affect the accuracy of
the system. In the first step, a Gaussian smoothing on all the corresponding 2D slices is
performed to reduce the noise and sensitivity effect. After Gaussian smoothing, Hessian matrix
and its eigen values |𝜆2|<|𝜆1| are calculated for every pixel to determine the local shape of the
structure. The suspected pulmonary nodule region exhibits the form of a circular or oval object
whereas vascular tissue structures presents a line-like elongated structure. Therefore, this
property can be used to distinguish different shape structures present in lung region. This
process is repeated for different scales and finally we integrate the filter’s output values to
obtain the maximum value for the best enhanced effect and generate the resultant image. After
image enhancement, lung nodule candidates are detected using optimal thresholding. Then a
rule-based analysis has been made based on some initial measurements like area, diameter and
volume whether to keep or discard the detected nodule candidate. The advantage of rule-based
3
analysis is that it eliminates the objects which are too small or too big to be considered as a
nodule candidate and thus reduces the workload for the next stage.
3. A hybrid feature set is obtained after rigorous experimentation which increases the
classification accuracy and reduces the false positive per scan considerably. The proposed
feature set plays a crucial role in the overall performance of the CAD system. We selected a
large pool of features initially and then trimmed down the set on the basis of accuracy and false
positive per scan and ultimately obtained the proposed hybrid feature set.
4. The classification of pulmonary nodules is done using SVM algorithm. In the classification
phase, the suspected pulmonary nodules are divided into true pulmonary nodules and false
pulmonary nodules. SVM as a high-dimensional multi-feature hyperplane differentiation
algorithm performs considerably well in a situation where it must decide only between the two
classes i.e., nodule or non-nodule and the features of the suspected pulmonary nodules refer
mainly to the two classes and the Gaussian Radial Basis Function (RBF) kernel function can
increase its linear separability which makes the detection and classification of pulmonary
nodules more accurate.
5. We have done an extensive evaluation of our proposed system on Lung Image Database
Consortium (LIDC). LIDC is a publicly available database accessible from The Cancer
Imaging Archive (TCIA). We have considered the 850 scans (LIDC-IDRI-0001 to LIDC-IDRI-
0844) of this dataset, which contains nodules of size 3-30 mm fully annotated by four expert
radiologists in two consecutive sessions. K-fold cross-validation scheme is used for model
selection and validation whereas the k value varies for 5, 7 and 10. An exhaustive grid search
has been used to tune the hyperparameters of SVM classifier. Some other classifiers have also
been used for classification of lung nodule candidates. An attempt has also been made to
determine the most relevant feature class for lung nodule detection system. The achieved
4
sensitivities at detection and classification stages are 94.20% and 98.15%, respectively, with
only 2.19 FP/scan. The results of our proposed method show the superiority of our scheme as
compared to other systems with increased sensitivity and reduced FP/scan.
The main contribution of this dissertation is the presentation of a relatively simple nodule
detection scheme that has a very good performance in an extensive experimental analysis. In
addition, the proposed feature set has helped in reducing the false positives significantly and
has increased the sensitivity of the proposed system. Moreover, a comparison has been made
to determine the most relevant feature class in extracted feature set. The overall sensitivity has
been improved compared to the previous methods and FP/scan have been reduced significantly.
5
PERTINENT PUBLICATIONS
Article published in journal
[1] F. Shaukat, G. Raja, A. Gooya, and A. F. Frangi, “Fully automatic detection
of lung nodules in CT images using a hybrid feature set,” Med. Phys., vol. 44, no.
7, pp. 3615–3629, Jul. 2017.
Articles under review
[1] F. Shaukat, G. Raja, “Computer Aided Detection of Lung Cancer Nodules:
A Review,” (Submitted)
[2] F. Shaukat, G. Raja, “Artificial Neural Network based Classification of
Lung Nodules in CT Images Using Shape and Texture Features,” (Submitted)
6
Chapter 1: INTRODUCTION
1.1 Research Background and Significance
Lung cancer is one of the leading causes of the deaths around the world with the smallest rate
of survival after diagnosis. The survival rate can be increased by early nodule detection [1]. It
is found in both developed and under developed countries [2]. According to an estimate,
225,000 people are diagnosed with lung cancer every year in United States costing $12 billion
in health care [1][3]. Another study shows that 433 americans die of lung cancer every day [4].
The deadliest year in terms of the mortalities caused by the lung cancer proved to be 2005, with
the staggering figure of 159,292. Though, there has been a mild descent since then by 2.3%
with mortalities accounting of 155, 610 in 2014. Men have been the major victim of this disease
with the higher age-adjusted rate of 51.7 per 100,000 persons as compared to women in which
it is 34.7 per 100,000 persons. It has almost same rates with black and white women while
black men have a higher rate (45.7 per 100,000 persons) as compared to white men (45.4 per
100,000 persons) [5].
The situation in under developed countries is more worse. Lung cancer is the most common
type of cancer in Asia with the highest risk in South East Asia [6]. According to an estimate,
the lung cancer mortalities in Asia rise up to the alarming figure of 926,436 out of 1,033,881
cases in 2012 with a dismal survival rate of 10.4 % [7]. Another study shows that 51% of the
lung cancer cases occur in Asia [8]. Pakistan has also been the victim of lung cancer with the
danger increasing every passing day. According to a study conducted in Jan 2014, the lung
cancer occurrence and mortalities both are increasing in the country. Lack of awareness, poor
hygienic conditions and meat consumption are the other main reasons apart from tobacco which
is the primary source of this deadly disease [9].
7
World Health Organization’s (WHO) latest data released in November 2014 states that
about 14 million new cancer cases were diagnosed with 8.2 million cancer deaths in 2012, in
which lung cancer was on the top of list with 1.59 million cases followed by 745,000 cases of
liver cancer. Lung cancer deaths were more than 113 percent of liver cancer with the
increase in each year, so the condition is one of the most worried about the treatment of lung
cancer patients. According to WHO statistics, the number of lung cancer deaths were 652,842
in which men accounted for 70.3% and women accounted for 29.7% with lung cancer
becoming the first cause of mortality in female rather than the breast cancer. An estimate
suggests that in 2020, the world population will reach 8 billion and the number of new cancer
cases will reach 20 million. The death toll will reach 12 million, in which the lung cancer with
the highest mortality rate will be the biggest threat to human health [8].
Another important factor which makes the lung cancer most deadly is its lowest five-year
survival rate (17.7 %) as compared to other leading cancers like colon (64.4 %), breast (89.7%)
and prostate (98.9 %) [1]. The importance of early detection increase with the fact that the
survival rate of localized disease (cancer within the lungs) is 55 % but unfortunately the rate of
early detection is very low with a disappointing figure of 16 % only which becomes even worse
(only 4 %) when cancer spreads to other organs [4]. Estimates suggest that by 2030, lung
cancer will reach around 10 million deaths per year [2].
Keeping in view the present situation, a new initiative called ‘the Cancer moonshot’ was taken
in 2016, to boost the research in the prevention, diagnosis and treatment of cancer and achieve
the progress of a decade in just five years [10]. According to a study, 20% of the lung cancer
can be reduced by its early detection [11].
8
1.2 Lung Cancer and Nodules
The main reason for lung cancer is the formation of cancerous nodules in lung region or lung
periphery. Nodules can be defined as lung tissue abnormalities having a roughly spherical
structure and diameter of up to 30 mm [12,13]. They can be classified into the following
categories: well-circumscribed, juxta-vascular, juxta-pleural, and pleural tail. Well-
circumscribed nodules are solitary nodules having no attachment to their neighboring vessels
and other anatomical structures. Juxta-vascular nodules show strong attachment to their nearby
vessels. Juxta-pleural nodules are found to have some attached portion to the nearby pleural
surface. Pleural tail nodules, having a tail which belongs to the nodule itself, show minute
attachments to nearby pleural wall [14]. Sample images of different nodule groups can be seen
in Figure 1-1. In the following section, we have analyzed different characteristics of lung
nodules which play a key role in their detection.
Figure 1-1: Sample images of four nodule groups (encircled). From left to right, well-
circumscribed, juxta-vascular, juxta-pleural and pleural tail nodules.
1.2.1 Imaging Features and Analysis of Pulmonary Nodules CT
The complexity and diversity of the lung nodules with different types makes the task of
detection quite difficult. In the process of imaging diagnosis, the features of pulmonary nodules
can help to infer the nature and type of cancer such as benign/ malignant and primary/secondary
bronchial lung cancer and differentiate from other pathological diseases to make the
appropriate qualitative diagnosis. The following section describes the key imaging features of
lung nodules and their diversity which makes the task of detection quite complicated.
9
(1) Lung nodule size: The diameter of pulmonary nodules is one of the important and primary
indexes to judge benign and malignant nodules like other diseases. The size of suspected
lesions can directly reflect the pathology and can be the most intuitive and easiest way to judge.
Pulmonary nodules in accordance with the size can be divided into three categories (i) Large
nodules (ii) Small Nodules (iii) Very Small or Micro Nodules. Large nodules normally have
the diameter of 20 mm to 30 mm. The image of large nodules is more obvious and they can be
detected easily. Diameter of 10 mm to 20 mm interval is of small nodules. Small nodules are
relatively more difficult to detect with a greater chance of missing the true nodule or false
detection as compared to large nodules. Diameter of 2 mm to 10 mm is of the micro-nodules
(very small nodules) [15]. To detect these type of nodules, precise segmentation, enhancement
and the combination of classification can be used. However, it can be ascertained that the larger
the nodules, the higher the probability of malignancy and vice versa.
(2) Location of lung nodules: The course of lung cancer is to obstruct the blood vessels or
bronchi in the nodule area and then cause the insufficiency of oxygen supply to the lungs and
ultimately causing a human death. Therefore, the nodular position can convey to the doctor
some regular information. The nodule position rule is also one of the important index of the
image diagnostics. The nodules can be divided into four types, solitary nodules (solitary
pulmonary nodules), pleural adhesions nodules (juxta-pleural and pleural-tail nodule) and
vascular adhesions nodules (juxta-vascular nodule) [14]. The characteristics of these four types
of nodules are different. Solitary nodules remain in the parenchymal area of the lungs and the
density of the surrounding tissue is different from that of the bronchus however this type of
nodule can be easily confused with the vascular section. Pleural adhesion nodules show that
nodules and pleura have contact. According to the degree of contact, they can be further divided
into two types. The degree of adhesion will be more difficult for doctors to judge, with too
much affixed to the pleura will be mistaken for the extra pleural region and too far away from
10
the pleura will have the possibility to misjudge the blood vessel. While the vascular adhesion
nodule is attached to the adjacent vessel which is similar to the pleural adhesion nodule and is
difficult to detect and diagnose in the adjacent tissue structure. Each of these types of nodules
has its own detection difficulty so we should combine various methods to detect them precisely.
(3) Pulmonary nodule density: The CT values of lung tissue structure are different in different
tissues but the pulmonary nodules and pulmonary parenchymal blood vessels have almost the
same density. Depending on the density value, the nodules can be divided into solid-nodule,
part-solid nodule and non-solid nodules. Among them, the ground glass opacity nodules (GGO)
are more difficult to detect because they have relatively low density, small distribution area
and show a blurred fuzzy shape. Since this kind of fuzzy nodules display benign traits such as
their growth rate is slow. Hence, computer-aided detection system often finds it difficult to
detect these types of nodules but in general this type of nodule is attributable to malignant.
(4) Pulmonary nodule edge: The edge contour of pulmonary nodules is also one of the
important indexes to judge benign and malignant nodules. Usually, benign pulmonary nodules
have a smooth edge with no obvious lobulation and burr phenomenon whereas malignant
pulmonary nodules are different with irregular edges, burrs or lobes phenomena.
In summary, characteristics of pulmonary nodules are diversified and the difficulty of detection
is explained in these aspects. The different features of pulmonary nodules like the unevenness
of the edge contour with the burr signs extending to the periphery, scattered and without
branching with a long peach tip as well as the extended and vague hyperemia of the surrounding
area is illustrated by four types mentioned above. Further, the shape of the pulmonary nodules
is different. The contour surface is uneven and it is connected with similar tissues such as blood
vessels. The lung nodule can also vary in size from very large to small which makes it difficult
11
to determine whether the area is a blood vessel or a nodule and it is also a major problem in the
detection process.
1.3 Computer Aided Detection
Computer Aided Detection (CAD) can help in early lung nodule detection. In
radiology, computer-aided detection, are procedures in medicine that assist doctors in the
interpretation of medical images [16]. Because of the rapid growth and increase of medical
imaging technologies, the importance of CAD has emerged seriously. Medical imaging allows
scientists and physicians to collect potentially life-saving information by peering noninvasively
into the human body. With medical imaging playing an increasingly important role in the
identification and treatment of disease, the medical image analysis community has become
preoccupied by the demanding problem of extracting, with the assistance of computers,
clinically useful information regarding anatomic structures imaged through CT, MRI, PET, and
other modalities. Although modern imaging devices provide excellent views of internal
anatomy, the use of computers to enumerate and examine the embedded structures with
accuracy and effectiveness is restricted. To support the spectrum of biomedical investigation
and medical activities from diagnosis, to radiotherapy, to surgery, accurate, repeatable,
quantitative data must be efficiently extracted. So, the main idea of CAD is the extraction of
interest regions with high accuracy [17,18]. CAD meets three main objectives:
Improve the quality of diagnosis
Increase therapy success by early detection of cancer
Avoid unnecessary biopsies.
With the development of medical imaging standards, different imaging modalities have been
used including X-ray, CT (Computer Tomography), MRI (Magnetic Resonance Imaging),
12
PET (Positron Emission Tomography) where X-ray is the oldest imaging modality. Computed
Tomography provides more detailed and accurate information of anatomic structure and has
greatly improved the detection rate of lung cancer nodules. Therefore, CT has become the most
effective method of diagnostic tests to detect lung cancer [19].
At present, multi-slice spiral CT [20] can detect pulmonary nodules even up to the diameter of
1 mm. Multi-slice spiral CT has 16 rows, 32 rows, 64 rows of spiral CT. 64 rows spiral CT
can produce hundreds of medical images. The range of medical images can vary from 150 to
200 and can go to as many as a thousand. A lot of repetitive work is to be done to read all these
images which is a great burden on the doctor and can increase the chances of misdiagnosis due
to the doctor's fatigue. To add to this misdiagnosis, lung has quite complex structure with
vascular and bronchial tissue structure making it more complicated. Vascular and bronchial
tissue structures’ close resemblance to the shape of pulmonary nodules like circular structure,
pulmonary tuberculosis of various types, with lobulation, calcification and other forms of
expression, as well as adhesion of blood vessels, adhesions in the bronchus and solitary nodules
and their diversity increase the difficulty of diagnosis.
CAD can be used to assist doctors and share a lot of repetitive work and can improve diagnostic
efficiency and accuracy. CAD systems help scan digital images, e.g. from computed
tomography, for typical appearances and to highlight conspicuous sections, such as possible
diseases. Currently it is being done manually where the results may vary due to the increasing
workload, the margin of human error and negligence. This is a research area that has generated
a great deal of interest. CAD system has integrated the knowledge of image processing
technology, pattern recognition, machine learning and data mining technology with the
experience of doctors' lung cancer clinical information, automatic information extraction and
comparison work to judge the symptoms of patients. With the development in technology,
CAD system can be used to serve as the second opinion to the doctor for diagnosis [21].
13
Because doctors are concerned about the location of the main pulmonary parenchyma lesions,
and lesions embodied in the form of lung nodules, therefore the aim of lung CAD system is the
automated segmentation of lung parenchyma from CT image of patients and then to extract
within the lung parenchyma, the region of interest also known as suspected lesion area and get
lung nodules from the region of interest to determine whether the region is malignant, in
combination with additional diagnostic information and physician experience to make the final
diagnostic conclusion. Experimental results show, CAD system can improve the accuracy of
lung cancer diagnosis by at least 15% [22].
Due to the different five-year survival rate of lung cancer, it is important to get the early
identification of lung cancer diagnosis and early treatment. However, there are several issues
that a hospital can face including (1) Hospital costs (2) Training of experienced radiologist (3)
Diagnosis of doctor. With the rapid development of computer technology, CAD system can be
introduced in medical diagnostic process. Application of CAD in the detection
of pulmonary nodules have improved the sensitivity and specificity. The application of CAD
in screening for lung cancer disease can quickly help in low-term diagnosis work [21].
Currently, there has been a lot more research in the field of CAD. With the rapid development
in computer technologies and the urgent needs of medical diagnosis, CAD system has
gradually become one of the hot topics of research in the medical industry and universities of
Europe and United States of America. Some of the CT based products which are in commercial
use are R2 Technology's 2004 FDA-Certified Image Checker CT [23]. Siemens’ syngo.CT
Lung CAD [24] and Veolity by Mevis Medical Solutions [25].
1.4 Organization of Thesis
In this dissertation, we have proposed a novel computer aided detection scheme for lung
nodules using a hybrid feature set which increases the overall sensitivity of the system and
14
reduces the false positive per scan. This research focused on DICOM (Digital Imaging and
Communications in Medicine) image segmentation of lung parenchyma followed by nodule
candidate detection and classification of candidate nodules into nodules and non-nodules.
Initially, image preprocessing is applied to remove any present noise from the input CT image.
Then a series of operations are performed to segment the main lung region, reducing the
viewing area for the doctors. After this, the image is enhanced which results in better
visualization of region of interest (ROI). Next, the candidate nodules are detected and the false
positives have been reduced using classifier. The main contents of this dissertation are:
The first chapter is introduction. The background and significance of the subject is described,
different types of lung nodules and their characteristics, the importance of CAD in lung cancer
detection and the research being carried out is presented. Finally, the organization of this thesis
is presented in this chapter.
The second chapter presents a detailed literature review of lung nodule detection methods. The
lung nodule detection method normally consists of image acquisition, lung segmentation,
nodule detection and classification techniques. In this chapter, we have discussed the salient
techniques present in the literature and analyze their advantages and disadvantages and
conclude this chapter with the problem statement which motivated our work in this thesis.
The third chapter describes in detail the proposed methodology which consists of automatic
lung segmentation from input CT image and removal of background image. The segmented
lung image is smoothed using morphological operations to include any juxta pleural nodules
present in the lung region. After this, the image is enhanced using multiscale dot enhancing
filter based on Hessian matrix. Using this enhancement technique, we have achieved a good
reinforcing effect to all types of nodules to a certain extent. Next the candidate nodules are
detected using optimal thresholding on dot enhanced images and a rule-based analysis is
15
applied to filter only the good nodule candidates. A hybrid feature set is extracted from the
nodule candidates and SVM classifier is used to reduce the false positives and classify the
nodules into nodules and non-nodules.
The fourth chapter presents the results and discussion section. We have done an extensive
evaluation of our proposed system on Lung Image Database Consortium (LIDC). LIDC is a
publicly available database accessible from The Cancer Imaging Archive (TCIA). We have
considered the 850 scans (LIDC-IDRI-0001 to LIDC-IDRI-0844) of this dataset, which
contains nodules of size 3-30 mm fully annotated by four expert radiologists in two consecutive
sessions. The overall sensitivity has been improved compared to the previous methods and
FP/scan have been reduced significantly. Finally, the conclusion and recommendations of the
dissertation are presented in Chapter 5, which provides the basis for future research.
16
Chapter 2: LITERATURE REVIEW
This Chapter presents the detailed literature review of computer aided lung nodule detection
schemes. We have divided the review in three sections. With a brief introduction of image
acquisition and the commonly available datasets, first section mainly presents the lung volume
segmentation techniques reported in literature. Second section presents the lung nodule
detection techniques reported in literature. A brief review of the related work (methods based
on nodule classification and feature extraction) highlighting the challenges which have
motivated our work in this dissertation is presented in the last section of this chapter. Computer
Aided Detection (CAD) can play an important role in aiding early detection of the cancer and
increasing the detection sensitivity [5,26].
Figure 2-1: Process of lung nodule detection consists of acquiring an image followed by lung
segmentation, nodule detection and false positive reduction.
A complete diagram for the typical lung CAD process is shown in Figure 2-1. The steps
involved in this process are briefly explained below. Image acquisition can be defined as a
process of acquiring medical images from imaging modalities [13]. Many common methods
are available for lung imaging. Computed Tomography (CT) stands out as a key imaging
modality compared to other lung imaging methods for the primary analysis of lung nodules
screening. The Lung Image Database Consortium (LIDC) [27] stands out among the available
public databases due to the standard radiological annotations provided with the images and its
AcquisitionAcquisition
Lung Field Segmentation
Lung Field Segmentation
Nodule Detection
Nodule Detection
False Positive Reduction
False Positive Reduction
17
widespread use. Others databases are, Early Lung Cancer Action Program (ELCAP) Public
Lung Image Database [28] and ELCAP Public Lung Database to Address Drug Response [29].
2.1 Lung Segmentation
Lung segmentation can be defined as the process of extracting the lung volume form input CT
image and removing the background and other irrelevant components. Lung segmentation
serves as a prerequisite to the nodule detection. Accurate lung segmentation plays an important
role to enhance the efficiency of lung nodule detection system. Numerous methods have been
proposed in literature for the extraction of lung volume from CT image such as optimal
thresholding, rule-based region growing, global thresholding, 3-D-adaptive fuzzy thresholding,
hybrid segmentation, and connected component labeling. After the initial segmentation, juxta-
pleural nodules are included by refining the extracted lung volume. To do this, a chain-code
method, a rolling ball algorithm, and morphological approaches have been generally used [30-
38].
Lung segmentation techniques can be broadly classified into four categories (i) Shape Models
(ii) Edge-based techniques (iii) Thresholding (iv) Deformable Boundaries. In the following
section, we have reviewed the selected studies from each of these categories used in lung
segmentation.
2.1.1 Shape-Based Techniques
This section presents a group of papers which have used shape-based techniques for lung
segmentation. In 2005, Sluimer et al. [34] developed an automatic shape based lung
segmentation method. In this scheme, the segmentation was done based on registration. In this
scheme, the pathological scan was elastically registered with normal scan. The proposed
method was evaluated using 26 three-dimensional thin-slice CT scans in which 10 scans with
high-density pathology were used as test data and the results were compared with the ground
truth of manual traced contours. The overlap measure of 0.8165 with 1.48 mm as the mean
18
absolute surface distance was achieved using the proposed methodology.
In 2011, Besbes and Paragios [39] proposed a graph-based shape model with image cues
based on boosted features for automatic lung segmentation. The constraints were prior encoded
using the Normalized Euclidian Distance between pairs of control points and graph topology
was deduced using manifold learning and unsupervised clustering. The task of segmentation
was divided into a task of labelling where the extracted image points were matched to model
landmarks. An additional label for outliers is added to overcome the limitations of missing
correspondence and the outliers are then repaired to complete the segmentation. The proposed
method was evaluated for the segmentation of right lung using the publicly available dataset of
247 chest radiographs. The ground truth was available in the form of gold standard
segmentation of the organ by the expert radiologists. The overlap measure of 0.9474 with the
mean absolute surface distance of 1.39 pixel was achieved using the proposed methodology.
In 2011, Sofka et al. [40] proposed a multi-stage learning method combining anatomical
information to predict the initialization of a statistical shape model of the lungs. Initialization
first detects the base of the trachea and uses it to automatically select a stable landmark on the
area near the lungs, such as ribs and spine. These landmarks are used to align shape models
and then refine through boundary detection to obtain fine-grained segmentation. Robustness is
achieved using discrimination classifiers, in hierarchical fashion that are trained on manual
annotate data of disease and healthy lungs. The proposed method was evaluated on 260 scans
and compare the results with the ground truth of 68 manual tracing of contours by expert
radiologist. The symmetrical point-to-mesh comparison error (SCD) of the proposed algorithm
was 1.95.
In 2012, Sun et al. [41] proposed a robust active shape model (RASM) for automatic lung
segmentation. The method consisted of two steps. Initially, the lung contours were roughly
segmented with the robust active shape model in which initial position was found with the help
19
of rib cage detection method. The segmentation was refined by the means of optimal surface
finding approach in second step. The right and left lungs were separated individually. The
proposed method was evaluated on 30 scans with 20 healthy and 40 diseased right/left lungs
and the results were compared to the ground truth of manually traced contour by the experts.
The dice similarity coefficient of the proposed method was 0.975 with mean absolute surface
distance error of 0.84 mm.
In 2014, Mansoor et al. [42] proposed a novel pathological lung segmentation method that
consisted of two stages. In first stage, the fuzzy connectedness was applied to segment the
lungs using rib cage information in parallel to estimate the lung volume. Then the two lung
volumes were compared to get the idea of any pathology present in result of any difference
between the two lung volumes. In second stage texture based features were computed to refine
the lung segmentation and include any missed abnormalities present in first stage. In addition,
a neighboring anatomy based approach was selected to include the low density weak abnormal
structures and juxta pleural nodules. The proposed method was evaluated with publicly
available and private datasets. The method produced an average overlap score of 95% on
private dataset with 400 CT scans with 96.84 % sensitivity and 92.27 % specificity. To validate
the results, it was also evaluated with the publicly available challenge dataset, Lobe and Lung
Analysis 2011 (LOLA11) and achieved the mean overlap score of 0.955 on the challenge
dataset consisting of 55 scans for right and left lung separately.
In 2015, Dai et al. [43] proposed lung segmentation method using Graph cut algorithm and
Gaussian mixture model. The proposed method denied the need of any post processing
techniques of lung contour smoothing like morphological operations and rolling ball algorithm
etc. The proposed method started with the modelling of foreground and background object as
GMM models and the expected maximization (EM) algorithm is used to calculate the weight
of each pixel belonging to foreground. These weights served as nodes and edges of the
20
corresponding graph and the segmentation was completed with the minimum cut theory. The
proposed method was evaluated on chest CT images provided by General Hospital of Ningxia
Medical University and the results were compared with the manual ground truth by expert
radiologists. The proposed method achieved the mean dice similarity coefficient index of
0.9874 with a standard deviation of 0.0070 for the CT images.
In 2017, Soliman et al. [44] proposed a shape based automatic lung segmentation method.
The method employed the technique of adaptive appearance- guided shape modelling. The
proposed method consisted of two visual appearance submodels and an adaptive shape
submodel which adds together to create the spatial inhomogenous 3D Markov-Gibbs random
field (MGRF). Local and global signal properties are specified by the filtered version of input
signal and its Gaussian scale space (GSS) which is done by the first order visual submodel.
Linear combination of discrete Gaussians (LCDG) was used to approximate the empirical
probability distribution of each signal in their close accordance. The approximations were
separated in two linear combination of discrete Gaussians representing the lungs and
background. The second order visual submodel used to quantify the intensity dependencies of
both the original and GSS-filtered images. Shape submodel is used for training dataset to adapt
the shape appearance during the segmentation phase. The proposed method was evaluated on
three different datasets including one private and two publicly available datasets. The private
dataset consisting of 30 CT scans whereas the two publicly available datasets “VESSEL” and
LOLA11 consisted of 20 and 55 CT scans respectively and the results were compared to the
ground truth. The overlap ratio of 0.98 was achieved using the proposed methodology.
2.1.2. Edge Based Techniques
Following section presents studies which have used edge based techniques for lung
segmentation. In 2004, Mendonca et al [45] proposed an automatic method for 2D lung
segmentation using edge detection technique in spatial domain. The lung segmentation was
21
completed in two stages. In first stage, the two region of interest were determined with each
one indicating a lung field. In the second phase, the ROI was analyzed for accurate detection.
For this purpose, the image was smoothed first with an averaging filter of size 9*9. The
proposed method was evaluated on 47 chest radiograph images. The results were compared
with the ground truth obtained by manual tracing of lung borders by an experienced radiologist
and the achieved sensitivity was 92.25%.
In 2005, Yim et al. [46] proposed a lung volume segmentation method based on region
growing and connected component labelling. The proposed method started with the extraction
of lung region and air ways via inverse seeded region growing and connected component
labeling. Then the trachea and air ways were delineated from the lungs by three-dimensional
region growing. Median filtering was used in preprocessing and morphological operations were
applied in post processing stage in second step. Finally, the lung contours were extracted by
subtracting the resultant of second step from first step. The proposed method was evaluated on
10 subjects and the results were compared with the ground truth of 10 manually traced contours.
The root mean square difference between the proposed method and ground truth was 1.2 pixel.
In 2006, Campadelli et al. [47] developed an automatic lung segmentation method using
spatial edge detector. After segmentation, the image was enhanced using the multi scale method
to increase the visibility of nodules. Support vector machine (SVM) classifier with Gaussian
and Polynomial kernel functions was used to reduce the false positives. The proposed method
was evaluated using a large set of postero-anterior (PA) chest radiographs and the results were
compared with the ground truth. The system achieved the highest sensitivity of 92% with 8
FP/image.
In 2007, Korfiatis et al. [48] proposed an automatic lung segmentation method based on
wavelet edge detector. The proposed method used the two-dimensional wavelet edge
highlighter as the preprocessing step to delineate the lung contour. After outlining the lung
22
contours, the lung volume is extracted using the three-dimensional gray thresholding with
minimum error technique. After lung volume extraction, 3D morphological closing is applied
using a spherical structuring element to deal with the mediastinum border. The proposed
method was evaluated using LIDC dataset of 23 scans and the results were compared with the
manual tracing of lung contours. The overlap measure of 0.983 with 0.77 mm as the mean
absolute surface distance was achieved using the proposed methodology. The root mean square
difference between the proposed method and ground truth was 0.52 mm.
2.1.3 Thresholding Based Techniques
The following section summarizes the techniques based on thresholding reported in literature.
In 2001, Hu et al. [49] developed a fully automatic lung segmentation technique using the
iterative threshold method. Optimal threshold value was obtained using an iterative approach.
Initially, a threshold value was selected which was iterated until there was no change in the
value. This method takes the advantage of the fact that different structures in Lung CT images
have different densities. After segmentation, the left and right lung was separated using
dynamic programming. Next, the segmented lung volume was smoothed using morphological
operations. The proposed algorithm was evaluated on eight 3D CT scans. The results were
compared to manually traced borders from two expert radiologists. The root mean square error
averaged over all the volumes between the automated lung segmentation and manually traced
border was 0.8 pixel (0.54 mm).
In 2007, Gao et al. [50] proposed an accurate and fully automated lung segmentation technique
based on thresholding. Initially, the large airways were removed from the input CT image by
anisotropic diffusion to smooth edges and region growth. Then optimum thresholding
technique was used to the segment the lung volume and left and right lungs were separated
using tracking algorithm. Finally, the lung contour was smoothed using rolling ball algorithm.
The proposed method was evaluated using eight CT scans of four patients and the results were
23
compared with the manual tracing of lung contours. The Dice similarity coefficient of the
proposed method was 0.9946.
In 2016, Shi et al. [51] proposed an automatic lung segmentation based on thresholding. The
method consisted of series of different steps. Initially, the input CT image was filtered to
remove any present noise using guided filter. Then the image was thresholded using Otsu
thresholding. The thorax region was extracted using region growing method and seed-based
random walk algorithm was used to segment the lung region from thorax. Finally, the image
was smoothed to include juxta-pleural nodules using curvature based correction method. The
proposed method was evaluated with 23 scans consisting of 883 2D slices and the results were
compared with the manually traced ground truth by expert radiologists. The overlap ratio of
98.4 % was achieved using the proposed methodology.
2.1.4 Deformable Boundary Techniques
This section presents a group of papers which have used deformable boundary models for lung
segmentation. In 2008, Shi et al. [52] proposed shape based deformable model for automatic
lung segmentation. The deformable model used the scale invariant feature transform (SIFT)
which is more descriptive feature as compared to other classes like intensity and gradient.
Second, both population-based and patient-specific shape statistics were used to segment the
lung fields from the chest radiographs which yields better and robust results. In this paper,
hierarchical PCA was used as compared to global PCA in learning patient specific shape
statistics phase. The advantage of using hierarchical PCA is that it increases degree of freedom
and can capture the shape diversity more accurately even with the small number of learning
samples. The proposed algorithm was evaluated in two folds. First the proposed algorithm was
evaluated (without patient-specific shape statistics) on Japanese Society of Radiological
Technology (JSRT) database of 247 chest radiograph images. In the second phase, the complete
algorithm (with patient-specific shape statistics) was evaluated using the 39 serial frontal chest
24
radiographs. The results were compared to the ground truth of manually traced images. The
overlap measure of 0.92 with the mean absolute surface distance of 1.78 pixel was achieved
using the proposed methodology.
In 2008, El-Baz et al [53] proposed a statistical Markov-Gibbs random field (MGRF) model
based fully automatic lung segmentation method. Linear Combination of Discrete Gaussians
(LCDG) with positive and negative components was used to better approximate the empirical
distribution of every signal. The conventional Expectation-Maximization (EM) algorithm was
modified to deal with the LCDG. The proposed method was evaluated on ten different real
datasets and the results were compared with the ground truth of 1820 manually traced images.
The system achieved an accuracy of 96.8%.
In 2010, Annangi et al. [54] developed a shape based deformable model for automatic lung
segmentation. The issue of local minima while using active contours for lung volume
segmentation was treated with multi-scale feature set which was achieve due to the good
contrast presents on lung boundary. In the feature computation phase, edge map was obtained
using Canny edge detector applied on histogram equalized image. The proposed method was
evaluated on 1130 images with the ground truth marked by expert radiologists. The Dice
similarity coefficient of the proposed method was 0.88 with a standard deviation of 0.07.
In 2017, Filho et al. [55] proposed an automatic lung segmentation method using Adaptive
Crisp Active Contour Method (3D ACACM). Initially a sphere was placed within the lung
which was deformed by the forces acting outwards. The minimization energy function was
calculated in an iterative manner to be used in the deformable model. The main contribution of
this method was the calculation of 3D Adaptive Crisp external energy which was used to detect
the origins of edges in lungs and the 3D Adaptive Balloon internal energy which was used to
expand the scope of segmentation. The topology of each point and the information of
neighboring slices were used to calculate this force. A robust 3D automatic initialization
25
technique was also proposed in this method which automatically initialized the seed points in
right and left lungs. The proposed method was evaluated with the 40 CT scans and the results
were compared with the commonly used approaches like 3D region growing, level set
algorithm and the semi-automatic segmentation by an expert. The proposed method achieved
the F-measure of 99.14% ±0.18 where F-measure (FM) denotes the harmonic mean of
predictive value and sensitivity. We have summarized the lung segmentation techniques in
Table 2-1.
Table 2-1: Review of Lung Segmentation Techniques1
1 * NA means Not available, OM means overlap measure and is defined as the volume of the intersection divided
by the volume of the union of two samples, DSC means Dice similarity coefficient and is used for comparing the
similarity of two samples, FM means F-measure and denotes the harmonic mean of predictive value and
sensitivity, RmsD means the root mean square difference of the Distance between the segmentation and the ground
truth, SCD means symmetrical point-to-mesh comparison error, AD means the mean absolute surface distance
and is defined as symmetric border positioning measure integrated along the entire surfaces.
CAD
Systems
Year No. Cases Image size Proposed
Technique
Ground
Truth
Performance
Soliman et
al. [44] 2017 105
512×512×270
-450 Shape-based
75 Manual
traced scans
OM= 0.98
DSC= 98.4
%
Filho et al.
[55] 2017 40 CT scans 512 * 512
Shape-based
deformable
model
Semi-
automatic
(manual +
commercial
software)
FM =
99.14%
Shi et al.
[51] 2016 23 CT scans 512 * 512 Thresholding
23 manually
traced data OM= 0.984
Dai et al.
[43] 2015 NA
512 * 512*
368 Shape-based
Manually
traced data
DSC=0.987
4
Mansoor et
al. [42] 2014
400 CT
images NA Shape-based
400 manually
traced data OM=0.955
Sun et al.
[41] 2012 30 scans
512 × 512 ×
424–642,
0.6–0.7mm
thin
Shape-based
30 manually
corrected
traced data
DSC =
0.975
AD = 0.84
mm
Sofka et al.
[40] 2011 260 scans 0.5–5.0mm Shape-based
68 manual
traced data SCD = 1.95
Besbes and
Paragios 2011
247 image
radiographs
256 × 256,
1mm thin Shape-based
123 manual
traced
OM =
0.9474
26
In summary, each technique has its own pros and cons. Threshold based techniques are very
good when it comes to high contrast CT images but the performance can vary with the low
contrast pathologies. Thresholding can also be affected with different imaging protocols and
image acquisition scanners. Moreover, different lung structures like blood vessels, bronchioles
[39]
data
AD = 1.39
pixel
Annangi et
al. [54]
2010 1130 image
radiographs
128 × 128 and
256 ×
256
Shape-based
deformable
model
1130
manually
traced
images
DSC = 0.88
El-Baz et al.
[53]
2008 10 image
datasets
512 × 512 ×
182,
2.5mm thin
Statistical
MGRF
model
1820 manual
traced
images
Accu. =
0.968
Shi et al.
[52]
2008 247 image
radiographs
256 × 256
Shape-based
deformable
model
247 manual
traced
images
OM = 0.92
AD = 1.78
pixel
Gao et al.
[50] 2007 8 subjects
512 × 512 ×
240 thresholding
8 manual
traced
datasets
DSC =
0.9946
Korfiatis et
al. [48] 2007 23 scans 512 × 512
Wavelet edge
detector
22 manual
traced data
OM =
0.983,
AD = 0.77
mm
Campadelli
et al. [47]
2006 487 image
radiographs
256 × 256 Spatial edge
detector
487 manual
traced
data
Sen. =
0.9174,
Spec. =
0.9584
Sluimer et
al. [34]
2005 26 scans 512 × 512,
0.75–2.0mm
Shape-based
10 manual
traced
data
OM =
0.8165,
AD = 1.48
mm
Yim et al.
[46]
2005 10 subjects
3D
512 × 512,
0.75–2mm
thin
Region
growing,
10 manual
traced data
RmsD = 1.2
pixel
Mendonca
et al. [45]
2004
47 image
radiographs
2D
NA Spatial edge
detector
47 manual
traced data
Sen. =
0.9225
Hu et al.
[49]
2001 24 datasets 512 × 512,
3mm thin
Iterative
threshold
229 manual
traced
images
RmsD =
0.54mm
27
and bronchi have so close densities with chest tissues that it is very difficult to accurately
threshold the region of interest and it requires special post-segmentation processes for accurate
segmentation. Deformable boundary based techniques have the disadvantage of extra
sensitivity of initialization. Further they are unable to overcome the inhomogeneity of lung
volume with the use of traditional external forces like edges and gray levels. Hence it becomes
difficult to achieve the accurate lung segmentation by guiding the deformable model. The
accuracy of shape-based segmentation techniques depends on the accurate registration of prior
shape-model with respect to the CT image. Poor registration in this regard can affect the overall
performance and it is the main limitation of shape based techniques. Further the diversity of
lung pathologies makes it difficult to accurately segment the lung fields [56].
2.2 Lung Nodule Detection
Nodule detection can be defined as the process of detecting suspicious areas in lung region
which may cause the lung cancer. It is performed after lung segmentation which decreases the
workload by removing the background and unwanted areas from input CT image. Various
methods have been presented in the literature for lung nodule candidate detection. Multiple
gray-level thresholding stands out among available methods. Moreover, shape-based, template-
matching-based, morphological approaches with convexity models and filtering-based
methods have been used for this purpose. In the following section, we have reviewed the
selected studies for lung nodule candidate detection.
This section presents different studies which have used different variants of thresholding for
lung nodule candidate detection. Akram et al. [57], Ko and Betke [58] and Zhao et al. [59]
applied multiple gray level thresholding for nodule candidate detection. They argued that a
single value of threshold cannot be used because vessels and different types of nodules may
have different density values so multiple threshold values were calculated for candidate nodule
28
detection. Choi and Choi [12] used multi scale dot enhancement filter for lung nodule candidate
detection. They proposed that since nodules exhibit a circular or dot like objects and they vary
in size therefore single scale to enhance all the nodule cannot be appropriate so multi scale dot
enhancement filter can efficiently detect the candidate nodules. After enhancement, the lung
nodules were detected using thresholding. Gonçalves et al. [60], Chen et al. [61] and Li and
Doi [62] proposed Hessian matrix based approaches for lung nodule detection. Gonçalves et
al. [60] proposed the use of central adaptive medialness principle for lung nodule identification
and segmentation. The proposed method used the shape index and curvedness properties for
identification of lung nodule candidates. The proposed method was evaluated using 569 solid
nodules of LIDC-IDRI dataset showing good results when compared with the ground truth of
manual segmentation of expert radiologists. Choi and Choi [18] proposed entropy based lung
nodule detection system. The proposed system consisted of three stages. In first stage, the input
CT image is divided into informative and non-informative blocks and non-informative block
are filtered out in this step. In next step, the candidate nodules are detected using informative
blocks. The informative blocks are enhanced before candidate nodule detection using 3-D
coherence-enhancing diffusion. After enhancement, the candidate nodules are detected from
enhanced informative image blocks using optimal thresholding. Finally, certain features are
extracted from lung nodule candidates and SVM is used for false positive reduction.
This section groups different studies which have used different variants of template matching
for lung nodule candidate detection in their proposed systems. Hasanabadi et al. [63], Wiemker
et al. [64] and Lee et al. [65] proposed a lung nodule detection system using template matching.
Hasanabadi et al. [63]’s proposed method consisted of three main steps. Initially the lung is
segmented from input CT image using thresholding and morphological operations. In second
step lung nodule candidates are detected using template matching and thresholding. Sixty
nodule patterns were extracted from LIDC dataset and similarity measure was marked between
29
the detected region and these templates and the region which qualified a certain threshold was
marked as nodule candidate. Finally, false positives were reducing using a feed forward neural
network classifier in the third step. The proposed system was evaluated using 07 CT scans from
LIDC dataset. El-Baz et al. [66] also used 2D and 3D deformable templates and a genetic
optimization algorithm to detect the lung nodule candidates.
This section presents different studies which have used different morphological approaches for
lung nodule candidate detection in their proposed systems. Cascio et al. [67] proposed a lung
nodule detection method using 3D Mass Spring Model. The proposed system used Region
Growing and morphological operations for lung volume segmentation. The lung nodule
candidates are detected using a 3D Mass Spring Model. The range of gray values and their
shape information from the model helped to identify accurately the lung nodule candidates.
The system was evaluated using 84 scans of LIDC dataset. Soltaninejad et al. [68] proposed a
lung nodule detection scheme using active contour and KNN classifier. The proposed scheme
consisted of lung volume segmentation using adaptive thresholding and morphological
operations. The lung nodule candidates are detected using 2D stochastic features and extracted
using active contour modeling. Finally, false positives are reduced using KNN classifier.
Jiantao et al. [69] proposed a shape based lung nodule detection method. The proposed system
consisted of three main steps: modeling, break and repair. Initially the regions of interest are
extracted and represented as shape model using the Marching Cubes Algorithm and the
problematic regions are being identified and removed using principal curvature analysis that
can lead to the inaccurate segmentation of the object. Finally, the incomplete regions are being
fitted with the properties of interpolation and extrapolation using radial basis function for
estimating and repairing the suspicious area smoothly. The proposed system was evaluated
using 230 chest CT scans. Kubota et al. [70] proposed a lung nodule detection method using
morphological operations and convexity models. The proposed system consisted of multiple
30
stages. Initially, the lung volume was extracted using the voxel transformation and figure
ground separation which includes the removal of any opacity from lung volume of input CT
image. After this, Euclidian Distance map is used to locate the seed point and then region
growing is applied to identify the candidate nodule region. Finally, the candidate lung nodules
are segmented using convex hull. The proposed system is evaluated using different subsets of
LIDC dataset. Agam and Armato [79], Awai et al. [80], Fetita et al. [81], Tanino [82] and Ezoe
et al. [83] also implied different morphological operations to detect the lung nodule candidates
in their proposed systems. We have summarized different techniques reported in literature for
lung nodule detection in Table 2-2.
Table 2-2 Review of Lung Nodule Detection Methods
CAD Systems
Year Detection Technique
Akram et al. [57], Ko and Betke
[58], Zhao et al. [59] 2016, 2001, 2004 Multiple gray-level thresholding
Choi and Choi [12] 2014 Multi Scale Dot Enhancement
Filter
Gonçalves et al. [60],
Chen et al. [61] and Li and Doi
[62]
2016, 2012, 2004 Hessian Matrix Based Method
Choi and Choi [18] 2013 Entropy Analysis
Hasanabadi et al. [63],
Wiemker et al. [64], Lee et al.
[65]
2014, 2002, 2001 Template Matching
El-Baz et al. [66] 2013 Template Matching and Genetic
Algorithm
Cascio et al. [67] 2012 Stable 3D Mass-Spring Models
Soltaninejad et al. [68] 2012 Active Contour and K-Nearest
Neighbors (K-NN) Classifier
Jiantao et al. [69] 2011 Thresholding and Geometric
Modeling
Kubota et al. [70] 2011 Convexity model and
Morphological Approach
Riccardi et al. [71] 2011 3D Fast Radial Transform
31
In summary, most commonly used lung nodule detection techniques can broadly be classified
into three categories mainly (i) Thresholding (ii) Template Matching (iii) Morphological
Approaches. Every technique has its own pros and cons. Thresholding based techniques have
the major issue of threshold value adjustment. Template matching techniques suffer from the
irregular shapes and diversity of lung nodules and the spherical and cylindrical assumptions
suffer difficulties in detecting the nodules attached to the pleural and vessels with efficiency.
Morphological based approaches suffer from the low detection efficiency of lung wall nodules.
2.3 False Positive Reduction
After nodule candidate detection, we have to classify them into nodules and non-nodules. In
literature, this step is commonly referred as false positive reduction and it comprises of two
steps (i) Feature Extraction (ii) Candidate Nodule Classification into nodules and non-nodules.
Several methods of extracting image features and nodule classification are proposed in
Namin et al. [72] and Murphy
et al. [73] 2010, 2007 Shape Index
Ozekes et al. [74] 2008 3D Template Matching
Ge et al. [75] 2005 Adaptive Weighted K-Means
Clustering
Mendonca et al. [76], Chang et
al. [77], Paik et al. [38],
Takizawa et al. [78],
2005, 2004,2004,
2003
3D Cylindrical and Spherical
Filters
Agam and Armato [79], Awai
et al. [80], Fetita et al. [81],
Tanino [82], Ezoe et al. [83]
2005, 2004, 2003,
2003, 2002 Morphological Operators
Saita et al. [84], Oda et al. [85] 2004, 2002 3D Connected-Component
Labelling
Yamada et al. [86], Gurcan et
al. [87], Kubo et al. [88] 2003, 2002, 2002 Clustering
Mekada et al. [89] 2003 Maximum Distance Inside a
Connected Component
Brown et al. [90] 2001 Patient-Specific Priori Model
Kawata et al. [91] 2001 Linear
Discriminate Functions
32
literature. Most used features are intensity based statistical features, geometric features and
gradient features [30,92]. With the help of extracted feature vectors, nodules are detected
through various supervised and un-supervised classifiers with reduced amount of false
positives [31-33,35,93-95]. There are some methods in which nodules are detected with
pixel/voxel-based machine learning without feature calculation [96-98].
We briefly review the related work in the following, highlighting the challenges which have
motivated our work in this thesis. In 2009, Cuenca et al. [32] proposed a CAD system for
solitary pulmonary nodule detection in CT images using an iris filter. The system was evaluated
using a private dataset, achieving sensitivity of 80% with 7.7 FP/scan. The dataset used in this
paper is private and contains less number of nodules i.e. 77. So, there is very little chance that
the performance of the system will not be affected in various scenarios regarding broad range
of nodule types present in the scans.
Murphy et al. [99] proposed a CAD system using local image features and k-nearest-neighbor
classification. The system was evaluated using a private dataset, achieving sensitivity of 80%
with 4.2 FP/scan. The system detected pleural and non-pleural nodules having size 2-14 mm
using 813 scans. The system uses a large data set for its evaluation but underperforms in terms
of sensitivity.
Guo et al. [100] proposed an adaptive lung nodule detection algorithm. The algorithm
consisted of a feature selection and classification part. In feature selection, eight features were
selected after extraction and SVM was applied as a classifier. The system shows a satisfactory
performance regarding sensitivity but standard datasets have not been used to evaluate the
performance and the used dataset is too small i.e. 29 scans with 2mm slice thickness including
only 34 true nodules.
Liu et al. [101] presented a CAD based pulmonary nodule detection method based on analysis
of enhanced voxel in 3D CT image. The method consists of multiple steps, including lung
33
segmentation, candidate nodules’ enhancement, voxel feature-extraction and classification
with SVM. The system shows good performance by achieving a sensitivity of 93.75% and 4.6
FP/scan but the dataset used consists of 32 cases containing only 33 solitary nodules.
Retico et al. [102] proposed a fully automated system to detect the pleural nodules in low dose
CT-scan images. A feature set consisting of 12 texture and morphological features was
extracted from each nodule candidate. The system achieved a sensitivity of 72% with 6 FP/scan
which shows that the system underperforms in terms of sensitivity.
In 2010, Messay et al. [92] proposed a system for lung nodule detection in CT images. A set
of 245 features were extracted and 40 were selected. The system was evaluated using LIDC
dataset. Achieved sensitivity was 82.66% with 3FP/scan. The system detected nodules of type
juxta-vascular, juxta pleural and solitary having size 3-30 mm. The system showed good
performance overall but underperforms in terms of sensitivity.
Ozekes et al. [103] proposed a computerized lung nodule detection method using 3D feature
extraction and learning based algorithms. The proposed system claimed sensitivity up to 100%
but the system does not give any information regarding the type of nodules in consideration
and a false positive rate of 44 per scan makes the scheme inefficient.
Sousa et al. [104] developed a method for automatic detection of lung nodules in CT images.
They used subset of features to reduce the complexity and increase the speed of the system.
Initially the system extracted 24 features and after selection, there were eight best features
selected. The system obtained a FP/scan of 0.42, FN of 0.15 and 84.84% sensitivity. But the
number of nodules on which the system is tested is too small i.e. 33 (23 benign and 10
malignant). So, there is very little chance that the performance of the system will not be affected
in various scenarios.
In 2011, Niemeijer et al. [94] showed that combination of different CAD systems can increase
the system’s performance as compared to individual system. The results of two different
34
challenges namely ANODE09 and ROC09 (Many state of the art systems participated in these
challenges) were collected and combined where ANODE09 consists of 55 lung CT scans. The
results of different combined studies outperformed the individual studies and concluded that
combination of different techniques can produce promising results.
In 2012, Mabrouk et al. [17] proposed a technique for automatic classification of lung
nodules in CT images using two classifiers. A total of 22 image features were extracted. A
fisher score ranking method was used as a feature selection method to select best ten features.
The system showed good results while dealing with large nodules but failed to detect the
smaller nodules.
In 2013, Assefa et al. [105] proposed a method based on template matching and multi-
resolution for lung nodule detection. Seven statistical and two intensity based features were
extracted for the false positive reduction stage. The system performed at a rate of 81%. Very
high false positive rate (35.15%) makes the scheme inefficient.
Choi et al. [18] proposed a detection method based on hierarchical block classification. The
image was divided into sub blocks and an analysis was made on the basis of entropy and then
sub blocks were selected having high entropy. System attained 95.28% sensitivity and 2.27
FP/scan only. The system shows a good performance overall but the system’ ability to detect
all types of nodules is limited.
Tariq et al. [106] proposed a CAD system for pulmonary nodule detection in CT scan images
using neuro-fuzzy classifier. A detailed feature set containing different properties were
extracted and applied to neuro-fuzzy classifier. They claimed that the method is effective which
can also detect small nodules. But the standard datasets and metrics to evaluate the system
performance have not been discussed. In addition, system does not give any information
regarding types of nodules in consideration.
Orozco et al. [107] proposed a novel approach of lung nodule classification in CT images
35
without lung segmentation. Eight texture features were extracted from the histogram and the
gray level co-occurrence matrix for each CT image. SVM was used for classification of nodule
candidates into nodules and non-nodules after being trained with the extracted features. The
reliability index of 84% was achieved. The system was tested using a private dataset consisting
of only 38 scans with nodules and system’s accuracy is low compared to other techniques.
Tartar et al. [108] proposed a method for classification of pulmonary nodules by using
different features. 2-D and 3-D geometrical and intensity based statistical features were used.
The system achieved 90.7% accuracy, 89.6% sensitivity and 87.5% specificity. The system has
been evaluated using a private dataset consisting of 95 pulmonary nodules.
In 2014, Teramoto et al. [109] proposed a hybrid method for the detection of pulmonary
nodules using positron emission tomography/computed tomography. The proposed method
was evaluated using 100 cases of PET/CT images. The system achieved a sensitivity of 83.0%
with FP/scan of 5.0. The system uses a novel approach of combining CT/PET images but
underperforms in terms of achieved sensitivity.
Choi et al. [12] proposed a computer-aided detection method based on 3-D shape-based feature
descriptor. A 3-D shape-based feature descriptor and a wall elimination method was introduced
to include juxta-pleural nodules. System achieved a sensitivity of 97.5% with 6.76 FP/scan
only. The system was evaluated with LIDC images having 148 nodules. The system shows
good performance overall but underperforms regarding the FP/scan.
In 2015, Ginneken et al. [110] proposed a system which used convolutional neural network
to extract the features to be used in lung nodule detection system. The 2D axial, sagittal and
coronal patches were extracted for nodule candidates and 4096 features were extracted from
the second last layer of the neural network. Linear SVM is used for classification of these
candidate nodules into nodules and non-nodules. The proposed system achieved a sensitivity
of 78%.
36
In 2016, Akram et al. [57] proposed a SVM based classification of lungs nodule using hybrid
features from CT images. The 2D and 3D geometric and intensity based statistical features
were extracted and used to train the classifier. The sensitivity of 95.31% is claimed but the
system does not give any information regarding FP/scan. In addition, the number of nodules
used to validate the results is too small. So, there is very little chance that the performance of
the system will not be affected in various scenarios.
Setio et al. [111] proposed multi view convolutional network based lung nodule detection
system. The proposed system implied three dedicated detectors for large, subsolid and solid
nodules. The final detection is done by the combination of multiple streams of 2D
convolutional networks using a dedicated fusion method. The proposed system is evaluated
using 888 scans of LIDC- IDRI dataset with additional evaluation on ANODE09 and DLCST
datasets. The system achieved a detection sensitivity of 90.1 % with 4 FP/scan only.
Anirudh et al. [112] proposed a lung nodule detection system using 3D convolutional neural
networks. The 3D CNN is used to learn discriminative features for nodule detection. The
proposed system starts by providing a point label of single voxel of nodule and its estimated
size. Unsupervised learning is used to estimate the final 3D label which is used to train the
convolutional neural network. The proposed system is evaluated using 67 scans of SPIE-
LUNGx dataset and achieved a sensitivity of 80 % with 10 FP/scan.
Jacobs et al. [113] compared the performance of the two commercial and one academic state
of the art CAD systems using LIDC-IDRI dataset. 888 scans of the dataset including 777
nodules were used to compare these systems. The study also demonstrated that CAD can also
help to find the missed nodules by the radiologists in the two-phase annotation process.
In 2017, Ding et al. [114] proposed a lung nodule detection system based on deep
convolutional neural networks. The proposed system consisted of two stages. In first stage, a
region based convolutional neural network is applied for nodule detection on image slices and
37
then in next stage, a 3D convolutional neural network is applied for false positive reduction.
The proposed system is evaluated on Lung Nodule Analysis Challenge (LUNA16). The
proposed system achieved high sensitivity of 94.4% with 4 FP/scan.
Setio et al. [115] developed an objective evaluation framework LUNA16 (Lung Nodule
Analysis 2016) to compare the performance of the state of the art CAD systems using largest
available public dataset LIDC-IDRI and inspected the possibility of combining different
methods. The outputs of different CAD systems were combined and showed much better
performance. 888 scans of the dataset including 1186 nodules were used to compare these
systems. The study also demonstrated that CAD can also help to find the missed nodules by
the radiologists in the annotation phase.
Zhu et al. [116] proposed an automatic lung nodule detection and classification system named
‘DeepLung’. The proposed system consisted of two main parts namely nodule detection and
classification. The lung nodule detection system was made of three dimensional faster regional
convolutional neural networks (R-CNN). The detector part of the proposed system was
evaluated using 10-fold cross validation scheme and LUNA16 dataset while the classification
part was validated using LIDC-IDRI dataset. The system achieved a detection sensitivity of 83.
4%. The review of these CAD systems is summarized in Table 2-3.
38
Table 2-3: Review of Current CAD Systems, * N/A means Not available.
CAD
Systems
Data
Set
No.
Cases
No.
Nodules
Extracted
Features
Sensitivity
(%)
FPR
Remarks
Cuenca et
al. [32]
Private 22
77 Intensity,
Morphol
ogical
80.00 7.70
Used dataset is
too small
containing less
number of
nodules.
Guo et al.
[100]
Private 29 34 Shape 94.77 N/A
Sousa et al.
[104]
Private N/A 33 Shape,
Texture,
Gradient,
Histogra
m,
Spatial
84.84 0.42
Liu et al.
[101]
Private 32 33 N/A 93.75 4.60
Orozco et
al. [107]
LIDC,
ELCAP
128 75 Texture 84.00 7.00
Tartar et al.
[108]
Private 63 95 Shape 89.60 7.90
Messay et
al. [92]
LIDC 84 143 Shape,
Intensity,
Gradient.
82.66 3.00 Systems
underperform
in terms of
sensitivity/
accuracy.
Murphy et
al. [99]
Private
813
1518
Shape
Index,
Curvedne
ss
80.00
4.20
Retico et al.
[102]
Private 42 102 Morphol
ogical,
Texture
72.00 6.00
Teramoto
et al. [109]
Private 100 103 Shape,
Intensity
83.00 5.00
Ozekes et
al. [103]
LIDC 11 11 Shape 100.00 44.0
0
High false
positive rate
makes the
schemes
inefficient.
Assefa et al.
[105]
ELCAP 50 165 Intensity,
Statistical
81.00 35.1
5
Choi et al.
[12]
LIDC 84 148 Shape
Based 3D
Descripto
r
97.50 6.76
Mabrouk et
al. [17]
Private 12 N/A Shape,
Intensity
97.00 2.00 System failed
to detect
39
In Table 2-3, we have summarized different studies into specific groups. The first section
presents a group of studies which have used small datasets containing small number of nodules.
It is presumable that the performance of these systems will be worsened in various more
realistic scenarios with broader range of nodule types present in clinical scans. Second section
presents a group of studies which underperforms in terms of accuracy/sensitivity by having
relatively lower accuracy/sensitivity as compared to other systems. Third section presents the
studies in which high false positive rate becomes a major issue. Last section presents other
studies highlighting some additional challenges.
smaller
nodules.
Choi et al.
[18]
LIDC 58 151 Shape,
Intensity
95.28 2.27 System’s
ability to
detect all type
of nodules is
limited.
Akram et al.
[57]
LIDC 47 50 Shape,
Intensity
95.31 N/A System is
evaluated with
small number
of nodules and
FP/scan is not
informed.
Ginneken
et al. [110]
LIDC 865 1147 Convolut
ional
Neural
Network
78.00 4.00
CNN may
have a high
computational
cost and
requires a
large dataset
for training,
which is not
mentioned in
the last two
studies.
Setio et al.
[111]
LIDC 888 1186 Convolut
ional
Neural
Network
90.1 4.00
Anirudh et
al. [112]
SPIE-
AAPM
LUNG
x
67 N/A Convolut
ional
Neural
Network
80.00 10.0
0
Ding et al.
[114]
LIDC 888 1186 Convolut
ional
Neural
Network
94.40 4.00
40
2.4 Problem Statement
In summary, the review of the current schemes shows their lack of ability to detect all nodules
while maintaining the same precision in terms of sensitivity and reduced number of false
positives per scan. Most of the algorithms are optimized and limited to a particular set of data
which limits the generalization of the results. In addition, the current schemes have not been
evaluated on sufficiently large datasets to achieve more robustness. Therefore, methods
evaluated having lesser number of nodules are not guaranteed to present the same performance
in all circumstances. Moreover, since feature extraction is very important for the
characterization of the nodules from other anatomic structures present in the lung region, the
choice of optimum feature set for nodule detection via conventional feature-based approaches
or convolutional neural networks is still an unresolved issue. Thus, the real challenge is to make
more accurate systems in terms of sensitivity and reduced FP/scan with increased nodule
diversity.
In this thesis, we present a novel technique for pulmonary lung nodule detection using a hybrid
feature set and SVM classifier. The proposed feature set has been achieved after rigorous
experimentation, which has helped in reducing the false positives significantly. Prior to nodule
detection, an image enhancement technique has been used to increase the detection rate of low
density nodules, which has helped to increase the sensitivity of the proposed system. A fully
automated lung segmentation technique has been applied using optimal thresholding and
connected component labeling. To the best of our knowledge no similar technique has been
reported with the combination of steps that we have used. In addition to SVM, different
classifiers have been used to evaluate the performance of the proposed system. Finally, an
attempt has been made to determine the most relevant feature class in extracted feature set. The
overall sensitivity has been improved compared to the previous methods and FP/scan have been
reduced significantly.
41
Chapter 3: PROPOSED SCHEME FOR LUNG NODULE
DETECTION
The proposed methodology consists of series of steps which start with pre-processing followed
by lung segmentation, image enhancement, nodule detection, feature extraction and
classification of lung nodules. The block diagram of the proposed method is shown in Figure
3-1. After preprocessing, the lung image is thresholded using optimal thresholding, then the
background removal and hole filling operations are done on the image prior to lung
segmentation from thresholded image. Contour correction is made to include juxta-pleural
nodules using morphological operations. Before ROI extraction, i.e. identifying the candidate
nodules, it is very important to make sure that all candidate nodules have been included. To
this end, the contour corrected image is enhanced. The candidate nodules are detected and
segmented simultaneously. Next, the features are extracted from lung nodule candidates and
used for classification using SVM classifier. In the following section, each step of our proposed
method has been described in detail.
Figure 3-1: Flow Chart of the Proposed Method
Contour Correction of Lung Lobes
Hole Filling
Background Removal
Optimal Thresholding
SVM
Feature Extraction
Candidate Nodule Detection
Image Enhancement
Non-Nodule
Nodule
Input CT Image
Pre-Processing
Lung Parenchyma Segmentation
42
3.1 Lung Segmentation
Lung segmentation has a critical importance as it is pre-requisite to the nodule detection. Any
in-accurate lung volume segmentation can lead to the low accuracy of whole system. In this
thesis, we propose a fully automated segmentation method for lung volume from CT scan
images.
3.1.1 Lung Image Preprocessing
Initially, the input CT images needs preprocessing which include pixel gray scale conversion
and denoising. Since, each pixel of the DICOM image occupies 2 bytes, or 16 bits, where
significant bits are 12, so its gray scale range is between 0 and 4095 and needs to be converted
to 0 to 255. The reason for the conversion is that the range of 4096 is too large and conversion
can improve the processing speed and save the pixel space. After gray scale conversion, any
present noise is removed from the image. In the CT scanning imaging process, introduction of
noise is inevitable and can ultimately lead to the false segmentation and classification if not
removed at this stage properly which can result in missed or false detection of nodules. There
are certain image preprocessing techniques which can eliminate the noise and reduce the error.
Most commonly used techniques are median filtering, mean filtering, Gaussian filtering,
wiener filtering and wavelet transform [13].
In this dissertation, median filtering [117] is used to remove any present noise in initial CT
images. The median filter can effectively remove the salt and pepper noise and speckle noise
while preserving the image details like edge information and image features and has a simple
and faster operating principle.
43
3.1.2 Lung Parenchyma Segmentation
3.1.2.1 Overview of Image Segmentation Methods
After image preprocessing, the first step is to segment the lung region. Lung segmentation is
the process of extracting the lung volume form input CT image and removing the background
and other irrelevant components. Since pulmonary nodules lie mainly within the lung volume,
the segmentation of the lung region must be accurate, complete and should not contain any
irrelevant information. Accurate lung segmentation reduces the workload of doctors and is one
of the most important steps of lung CAD.
Common methods of image segmentation are region-based segmentation, boundary-based
segmentation method and segmentation method based on specific theory. Region-based
segmentation methods are mainly thresholding, region-growing, region splitting and clustering
method [56]. Thresholding is one of the most commonly used segmentation method among
these. It is used to analyze the histogram of the lung images which show bimodal form and is
quite effective in the initial segmentation. There are different types of thresholding including
global thresholding, adaptive thresholding, optimal thresholding, multiple thresholding and
variable thresholding. Region growing is a method of obtaining the larger regions from the
initial seed point by grouping the same pixels with respect to a predefined criterion. The key to
the region-growing method is the selection of the initial seed point and the growth rule. This
method has better segmentation effect, but the seed point selection involves human intervention
and cannot meet the requirement of automatic processing. The boundary-based segmentation
method mainly detects and links the edge pixels to make the boundary contour per gradient of
the gray scale and then performs the segmentation to obtain the final required image. Gradient
of image can be calculated using Sobel, Canny, Roberts or Gaussian operator. This method is
very sensitive to the edge detection but has some problems including production of spurious,
44
missing or discontinuous edges leading to inaccurate segmentation. Further, it is quite sensitive
to noise and does not work well in low contrast images.
3.1.2.2 Proposed Method of Lung Volume Segmentation
The segmentation of lung volume is the basis of follow-up nodule detection. Since the doctor
focuses on observing the area only within the lung volume, it is necessary to eliminate
irrelevant information outside the lung volume and reduce the observed information for the
doctor, which is important in the pulmonary nodule test. In the original CT images, density of
the lung volume is different from the background which provides the basis of intensity based
segmentation techniques to be used effectively. We have not used the region-growing
algorithm, mainly because of the manual selection of seed point and relatively slower speed.
In this research, we have used optimal thresholding followed by a connected component
labeling and contour correction [49,118]. The proposed work flow for lung volume
segmentation is shown in Figure 3-2.
Figure 3-2: Lung Parenchymal Segmentation Flow Chart
Lung segmentation consists of series of steps. Initially, the CT image is segmented by using
optimal thresholding and the lung volume is obtained using connected component labeling
method and other irrelevant information is removed at this stage. The resultant image at this
45
stage contains holes which is filled with the hole filling algorithm e.g. morphological
operations. Finally, the lung contour is smoothed by rolling ball algorithm to include any juxta
pleural nodules. In the following section, each of this step is described in detail.
3.1.2.3 Thresholding
For optimal thresholding, let 𝑇𝑖 be the threshold after the 𝑖𝑡ℎ step. The lung CT scan can be
divided in two density groups. The gray scale values of lung CT scan normally varies from 26
to 250 (-1000 HU to +1000 HU). The lung area also called non-body area is a low-density area
and its gray scale value ranges from 50 to 150 (-910 HU to -500 HU) [49]. The CT scanner
area is also part of the non-body area. The body area contains the surroundings of lung region.
Because the lungs are in non-body area, we initially select a threshold value of 150 (-500 HU)
for 𝑇𝑜. For selection of new threshold, we apply 𝑇𝑖 to the lung image. Let 𝜇𝑜 and 𝜇𝑏 be
the mean intensities of the object and background in the lung region respectively, the new
threshold is given by [49]:
𝑇𝑖+1 = 𝜇𝑜 + 𝜇𝑏
2 (3.1)
Where 𝜇𝑜 and 𝜇𝑏 can be calculated as:
00
0
=
T
i
i
T
i
i
i p
p
(3.2)
(3.3)
1
1
=
L
i
i Tb L
i
i T
i p
p
46
Where ip is the probability of 𝑖 gray value. In this manner, this iterative approach carries on
until our threshold converges to a point i.e. the difference between 𝑇𝑖+1 𝑎𝑛𝑑 𝑇𝑖 is less than a
predefined value. At this point the iteration stops and an optimal threshold 𝑇𝑜𝑝 is obtained. As
such, an initial segmented lung image volume 𝑓(𝑥, 𝑦, 𝑧) can be obtained as follows:
𝑓(𝑥, 𝑦, 𝑧) = {1 𝑓(𝑥, 𝑦, 𝑧) ≥ 𝑇𝑜𝑝
0 𝑓(𝑥, 𝑦, 𝑧) < 𝑇𝑜𝑝 (3.4)
In which x and y indices represent the slice coordinates and z indicates the slice number. The
volume consists of the total number of z slices and each slice has dimensions of x × y pixels.
Results of optimal thresholding on a few sample images can be seen in column (b) of Figure
3-3.
After applying optimal thresholding, we get a lung CT image which contains body and non-
body area. White area belongs to non-body area and black belongs to body area. We are
interested in extracting the lung region from non-body voxels. To achieve this, we apply 3D
connected component labeling to initially thresholded image 𝑓(𝑥, 𝑦, 𝑧) to acquire the lung
region from non-body voxels. We have used 18-connected neighborhood to obtain the 3-D
connected components. This provides a tight connectivity making every voxel neighbor to
other which touches its face or edge. After labelling, we select the lung regions based on the
size of these volumes. The air in the vicinity of body is easily removed because it is connected
to the border of volume. Using this technique, the first and second largest volumes are selected.
Most of the unwanted components (air outside the body and gas in the intestine) are ignored in
the volume selection and hence removed. The resultant image at this stage contains holes in
lung region, which may be potential nodules or vessels. These must be included to the lung
region for accurate detection and thus filled by morphological operations. The resultant image
at this stage can be seen in column (c) of Figure 3-3. The hole-filled image may contain the
47
potential nodules at the border known as juxta-pleural nodules. These nodules must be included
for accurate detection. To include these, we use a rolling ball algorithm [118].
3.1.2.4 Boundary Repair of Lung Parenchyma
Various methods have been proposed for repairing gaps in the lung boundaries. The rolling
ball method is one of the popular repair method. In this method, a two-dimensional ball filter
is placed tangentially on the boundary of the lungs and rolled along the direction of the lung
boundary. If there is a gap in the lung boundary, it is identified by the contact of rolling ball
filter at more than two points. This gap is filled by the new contour segment that linearly
connects the two end points of gap. The basic principle of this method is to use the change in
boundary curvature and circular filter template for lung contour morphological operation to
achieve the goal of smoothing.
One important aspect which needs to be considered is the selection of size of the rolling ball
filter. If the selected radius is too small, smoothing of the lung contour will have no effect and
will affect the desired segmentation of the lung contour which can lead to lower detection
accuracy of the system.
On the contrary, if the selected radius is too large, then there is a possibility of pleural part
inclusion into the real lung contour, adding the interference to the area of lung nodules and
affecting the results. So, it is very important to set the appropriate circular radius after large
number of validations. Final process images after lung contour corrections are shown in column
(e) of Figure 3-3.
48
(a) (b) (c) (d) (e)
Figure 3-3: Example images of lung volume segmentation, (a) to (e) from left to right
presenting input, thresholded, hole filled, lung segmented and contour corrected images,
respectively.
49
3.1.2.5 Separation of Left and Right Lung Parenchyma
One other aspect that needs to be explained is the separation of right and left lung region. After
the extraction of the lung volume, in many cases the gray level thresholding fails to separate
the right and left lungs completely and there may exist a junction between them which needs
to be removed for further processing including smoothing of lung contours. The pseudo-code
for lung separation is as follows:
1. Binarize the input connected lung region;
2. Set horizontal axis x=256 for image size 512 × 512;
3. For x, set the horizontal scan area for the left and right
floating 20 points each side, i.e. 236 to 276.
4. Start column scan after fixing horizontal coordinate value
to 0.
5. Record the first value of the maximum gray value coordinates
and continue to scan and then record the first value of 0
coordinates.
6. Store the difference ΔL.
7. Repeat steps (4) to (6) for set abscissa area.
8. Compare with the abscissa value 256 left 20 data changes and
then compare the abscissa value 256 on the right side of the 20
data changes, one side of the trend for the first change becomes
smaller.
Record its value compared with the right, if still it is the
smallest value, it is considered as the separation position
otherwise the original axis coordinate 256 wants the array value
to be smaller on the side of the transformation.
9. Repeat steps(4)to(8) until the separation position is found.
After separation of left and right lungs, gray level values were converted to 0 according to the
location. This process is shown in Figure 3-4 where (b) of Figure 3-4 represents the output
image of this algorithm.
50
(a) (b) (c)
Figure 3-4: (a) Represents a parenchymal image (b) represents an image after repair of the
lung parenchyma (c) zoomed view of left and right lung contour separation.
3.2 Image Enhancement and Nodule Detection
Image enhancement is very critical for the sensitivity of the lung nodule detection system as it
plays an important role in detection of the nodules by enhancing them and reducing false
positives by weakening the other structures in lung region [119]. It is also necessary because
there are some low-density nodules, which may remain undetected. Hence, it is imperative for
us to account for every potential nodule candidate. Because of the CT scan and the complexity
of the lung parenchyma, lung nodules, blood vessels and bronchial exhibit almost same gray
scale values. The similarity of these organizational structure makes it almost impossible to
accurately extract the pulmonary nodules. To accurately extract suspected pulmonary nodules,
it becomes necessary to change the gray scale values for these regions of interest and increase
the contrast between the two to screen out suspected pulmonary nodules. Here, we are
interested in accurate extraction of the pulmonary nodule which is our region of interest,
therefore, prior to the detection of pulmonary nodules, it is necessary to perform image
enhancement on the region of interest which includes low density nodules aka ground-glass
nodules. The enhancement of the region of interest is to highlight the suspected pulmonary
nodules in the image and to reduce the gray scale value of the non-nodular region so that the
51
contrast between the two can be improved and thus it is more likely that the suspected
pulmonary nodules can finally be extracted.
Image enhancement is based on specific circumstances that will highlight some of the features
and some features will be weakened to achieve good image effect. So, there are two main
objectives of the enhancement: First, improve the visual perception of the image to improve
the image itself. Second, the analysis and decomposition of the region of interest in the image
helps to study the information in this respect.
In summary, image enhancement is one of the key steps in the detection of pulmonary nodules.
The enhancement of the ROI is mainly divided into three steps: the first step is to obtain the
lung parenchyma region, this step has been completed in the previous section, the second step
is the enhancement of the ROI and the third step is classification of candidate nodules into
nodules and non-nodules based on their characteristics for initial screening.
3.2.1 Theoretical Research on Image Enhancement Algorithm
There are many methods of image enhancement, such as contrast stretching, Gamma
correction, histogram equalization, frequency domain and spatial domain sharpening and other
enhancement methods [120]. The enhancement techniques can be divided into two categories
(i) spatial domain (ii) frequency domain image enhancement techniques. Regardless of the
spatial or frequency domain image enhancement, the aim is to highlight the image information
and suppress the interference information to complete the conversion of the image itself for the
next stage to provide a good data information base and then better extract the required image
section.
52
3.2.2 Multi-Scale Enhancement Algorithm Based on Hessian Matrix
Due to the nodules’ characteristics, like their shape is similar to a circular or dot like object so
the grayscale value of such a shape can be enhanced. The spatial enhancement algorithm will
enhance the vascular tissue, bronchial tissue and nodules in the image but the focus of
enhancement will be the edge information of these structures and the contrast of gray-scale
enhancement would be relatively limited. To better extract nodules, it is necessary to enhance
the intensity of the circular region of the image and suppress the grayscale values of the blood
vessel so as to enhance the overall effect of the area of interest. The area of interest mentioned
here is the suspected pulmonary nodules. But the pulmonary nodules and other lung tissues
such as the cross-section of the blood vessels are characterized by circular dense images and
their densities are quite similar which is one of the most important difficulties in the extraction
and detection of pulmonary nodules.
In this dissertation, we propose a multi-scale dot enhancing filter [121] based on Hessian matrix
for image enhancement. In the first step, a Gaussian smoothing [122] on all the corresponding
2D slices is performed to reduce the noise and sensitivity effect. A 2D smoothing is applied
because it produces promising results and reduces computational complexity. After Gaussian
smoothing, Hessian matrix and its eigen values |𝜆2| < |𝜆1| are calculated for every pixel to
determine the local shape of the structure [119]. The suspected pulmonary nodule region
exhibits the form of a circular or oval object whereas vascular tissue structures presents a line-
like elongated structure. Therefore, this property can be used to distinguish different shape
structures present in lung region [123]. Figure 3-5 shows the specific flow chart of the
enhancement algorithm. In the following section, each step of enhancement algorithm is
described in detail.
53
Figure 3-5 Multi-Scale Circular Filter Enhancement Algorithm Flow Chart
3.2.2.1 Initial Enhancement Based on Gaussian Function
Gaussian filter is a linear smoothing filter in frequency domain which is widely used in image
processing field. The most important parameter of Gaussian filter is its scaling parameter
which controls the smoothing effect. When the value of is smaller, accuracy of signal edge
positioning is higher but the smoothing effect is smaller and we have the worse noise
suppression ability. Conversely, with the increasing value of , smoothness of the signal is
Start
Set the initial scale
Obtain eigenvalues from the Hessian
matrix
Compare the enhanced image with previous
one
Scale values reach threshold?
Get enhanced images
Yes
No
Modify the scale
End
Gaussian smoothing according to scale
54
greater and noise removal effect is better but the signal edge will be blurred and edge traverse
phenomenon is another serious issue. Therefore, the value of in Gaussian filter is an
important issue. In lung CT images, the pulmonary nodules are circular or dot-like objects. The
Gaussian function of a dot (target pulmonary nodule) can be approximated as (3.5) [121]:
2 2
2( , ) exp( )
2
x yd x y
(3.5)
Based on the theory of normal distribution in probability theory, while the radius of lung
nodules was 0 ,the pulmonary nodules accounted for 49.91% of the Gaussian distribution
function area. When the radius of the pulmonary nodules was 2 0 , the pulmonary nodules
accounted for 68.26% of the Gaussian distribution function area and with the lung nodule
radius 4 0 , the pulmonary nodules accounted for 95% of the Gaussian distribution function
area [121]. In other words, to cover the specified size nodule detection with a diameter d, it is
necessary to set the appropriate scale using the scale parameter d/4. Approximation of the
Gaussian function with this scale parameter is most appropriate which can enhance the
specified radius of the target pulmonary nodule. The two-dimensional Gaussian function is
calculated as [119]:
2 2
2
1( , , ) exp( )
22f
ff
x yG x y
(3.6)
The convolution response of the target nodule and the Gaussian function is given by:
( , , ) ( , , )* ( , )f fR x y G x y d x y (3.7)
At the object center (0,0) , to obtain the strongest Gaussian response (0,0, )fR , the two
scales must be same. So, with 0 (the set value of object scale), when expression becomes
55
(0,0, )0
f
f
R
, at the same time, R has the maximum value and
0f is the optimal scale
for the lung nodule enhancement.
3.2.2.2 Hessian Matrix Construction
The Hessian matrix method is to extract the image characteristic direction by higher order
differential processing and the eigenvalues can be used to judge the different types of points
and structures on the image such as the nodule in the lung image and the edge of the blood
vessels. In this dissertation, since 2D smoothing is applied on all the corresponding 2D slices
so the Hessian matrix of the measured point is set to the real symmetric matrix of the second
order and the expression is as [121]:
=xx xy
yx yy
f fH
f f (3.8)
Obtaining the corresponding characteristic function formula such as:
( )( ) 0xx yy xy yxf f f f (3.9)
Thus, we obtain two eigenvalues from the Hessian matrix 1 , 2 respectively:
1 ( + ) / 2 K Q (3.10)
2 ( ) / 2 K Q (3.11)
Where K and Q are as follows:
xx yyK f f (3.12)
2( ) 4xx yy yx xyQ f f f f (3.13)
The second order Hessian matrix template has been constructed for the next stage. The
eigenvalue parameter size of the Hessian matrix is the basic reference parameter of the circular
enhancement filter.
56
3.2.2.3 Construction of Circular Enhancement Filter
As described earlier, the morphology of the suspected pulmonary nodules in the lung region is
circular or oval and the structure of vascular tissue is line-like elongated structure. Therefore,
it is possible to differentiate the characteristics of different shapes by using circular enhanced
filter to enhance the information of suspected pulmonary nodules and inhibit the vascular
tissue. In this thesis, the circular enhancement filter in the two-dimensional space is used. The
circular structure and the line structure can be expressed as:
2 2
22( , )
x y
d x y e (3.14)
2
22( , )
x
l x y e
(3.15)
For two-dimensional image ( , )f x y , 1 and 2 are two eigenvalues of a Hessian matrix with
the condition | 1| | 2| . For the circular structure and the line structure, the two eigenvalues
need to correspond to the corresponding prerequisites. For circular structures [119]:
𝜆1 = 𝜆2 << 0 (3.16)
And for Line structures:
𝜆1 << 0, 𝜆2 = 0 (3.17)
Here we have assumed that we are trying to enhance bright objects from their dark background.
The filter response can be calculated as:
𝐸𝑐𝑖𝑟𝑐𝑙𝑒 = {|𝜆2|2/ |𝜆1|, 𝜆1 < 0, 𝜆2 < 0
0, otherwise
(3.18)
Because we have different pulmonary nodule diameters, a single scale for enhancement was
not good enough. Therefore, we used multi-scale enhancement filtering to optimize the
57
extraction. By assuming that the nodules to be detected have diameters in the range [𝑑𝑜 , 𝑑1 ] ,
the 𝑁 discrete smoothing scales in the range [𝑑𝑜
4⁄ ,𝑑1
4⁄ ] can be computed as [121]:
𝜎1 =𝑑0
4, 𝜎2 = 𝑟
𝑑0
4, 𝜎3 = 𝑟2
𝑑0
4, … … … … . 𝜎𝑁 = 𝑟𝑁−1
𝑑0
4 =
𝑑1
4 (3.19)
Where 𝑟 = (𝑑1
𝑑𝑜)
(1(𝑁−1)⁄ )
and each scale has the corresponding nodule diameter 4𝜎. The
algorithm works as follows: First, we determine the specified 𝜎 scale of the image by using
Equation (3.19) and smooth the image using Gaussian function. Initially, smallest value of
scale is selected which is incrementally extended. Then the two eigen values of Hessian matrix
, 𝜆1 and 𝜆2 are calculated which are followed by the calculation of respective value of 𝐸𝑐𝑖𝑟𝑐𝑙𝑒
filter. This process is repeated for different scales and finally we integrate the filter’s output
values to obtain the maximum value for the best enhanced effect and generate the resultant
image as:
𝐼𝐷(𝑥, 𝑦) = { 1, if: 𝐸𝑐𝑖𝑟𝑐𝑙𝑒,𝑚𝑎𝑥
0, otherwise
(3.20)
where 𝐸𝑐𝑖𝑟𝑐𝑙𝑒,𝑚𝑎𝑥 = max 𝐸𝑐𝑖𝑟𝑐𝑙𝑒 , 𝜎 ∈ [𝜎𝑚𝑖𝑛, 𝜎𝑚𝑎𝑥]. Figure 3-6 shows the results of image
enhancement at different slices.
58
(a) (b) (c) (d)
Figure 3-6: Example images showing results of image enhancement at different slices. (a)
and (b) shows a low-density nodule in red circle, which is detected after image enhancement
where (c) and (d) shows the other two slices after image enhancement.
3.3 Lung Nodule Detection and Classification
After number of previous steps, the lung parenchyma area carrying lung nodules is extracted
and now the most important and last step in computer aided detection of pulmonary nodules,
which also serves as the output of CAD system, is detection and classification of suspected
pulmonary nodules. This step includes not only image processing technology but also data
mining technology.
Detection of pulmonary nodules can be done using different techniques including image
segmentation, image matching and image enhancement technology. The organizational
structure of the lung parenchyma region contains vascular tissue, bronchial tissue and
pulmonary nodules. So, unlike the previous detection methods, there is a need of more precise
algorithm because it can directly affect the results of our classification system. Most intolerable
thing for a radiologist is a missed nodule which can be potential cancerous nodule. For
radiologist, the lung disease is not just lung cancer. Tuberculosis and pneumonia also belongs
to other types of lung disease. Therefore computer-aided detection system can rather detect
erroneously but should not miss any potential candidate nodule is the primary requirement of
the system.
59
Lung nodule detection based on image segmentation technology is mainly based on gray-scale
threshold method but threshold value should be properly adjusted to improve the detection of
suspected pulmonary nodules. In addition to the threshold segmentation, there are some other
techniques based on image segmentation which are described in literature review.
Next, the image matching technique can be used to construct the pulmonary nodule template
according to the nodular morphological features and gray scale variation characteristics and
matched with the gray area in the lung parenchyma of the patient. As there are different types
of pulmonary nodules and the number of lung cancer patients are growing every day, so the
diversity of lung nodules is a challenging task for the creation of a lung nodule template.
Researchers have been trying to build a template library of lung nodules to increase the
detection accuracy of early lung cancer but it is still an unresolved issue.
Finally, the image enhancement techniques in spatial and frequency domain can also be used
to detect the suspected pulmonary nodules. The fact that makes pulmonary nodule detection a
difficult task is that we want to exclude as many non-lung nodule areas as possible without
missing any potential nodule candidate. In other words, algorithm rules should be flexible to
perform this task with precision and accuracy.
The classification of pulmonary nodules after pulmonary nodule detection is also known as
false positive reduction and it consists of two steps. In first step, features are extracted from the
suspected pulmonary nodules. The features of pulmonary nodules include intensity (grayscale)
features, shape features, texture features and some other features [30, 92]. Selection of nodule
features is very important as there are many features reported in literature and the detection
results can have varying degrees of influence with respect to feature set where increase in the
accuracy of lung nodule detection system can also extend the processing time of the system.
60
The effect of some features is different and the correlation degree of features is also different.
Therefore, it is necessary to calculate a large set of features and then select the optimal feature
set by feature selection and data mining techniques and finally use one or more classifiers for
lung nodule classification. The strength of the detection capability depends on the optimization
of the feature set. The key lies in the selection of the number of features, the impact of the
features and their correlation [124].
3.3.1 Rule-Based Analysis of Lung Nodule Candidates
In this dissertation, lung nodule candidates are detected by applying optimal thresholding (same
algorithm is used for nodule candidate detection which was used for lung thresholding and is
explained in previous section) on dot enhanced images. Then a rule-based analysis has been
made based on some initial measurements like area, diameter and volume whether to keep or
discard the detected nodule candidate [92]. The advantage of rule-based analysis is that it
eliminates the objects which are too small or too big to be considered as a nodule candidate
and thus reduces the workload for the next stage. All segmented objects must meet the
following basic size requirements to be considered as a good nodule candidate. The computed
area may lie in the range 4-908 mm2, equivalent to a diameter 2.5-34 mm and the volume must
not exceed the range 8-20580 mm3. After rule-based analysis, several features are extracted
from good nodule candidates and used to train the SVM classifier in the next step. Examples
of some detected nodule candidates can be seen in Figure 3-7.
(a)
61
(b)
Figure 3-7: Examples of detected candidates (a) nodules (b) non-nodules. It can be seen that
nodule diversity and their close resemblance to other anatomic structures present in the lung
region make the task of detection more challenging and produces false positives, which are
being reduced with the aid of a classifier.
The goal of this step is to reduce the FP/scan. It comprises of two steps: feature extraction, and
classification. We briefly provide details on each of these steps in the following.
3.3.2 Feature Extraction
Feature extraction can be used to reduce the original dataset to certain characteristics, which
can differentiate one input from others. Nodules have their own characteristics, which
differentiate them from other anatomical structures present in lung region [18].
Pulmonary nodule features can be broadly classified into three categories namely, gray
(intensity) features, shape (morphological) features and texture features. These features exhibit
complete nodal characteristics and can be converted into rule based analysis. A wide variety of
image features can play an important role in the subsequent classification of suspected
pulmonary nodules. Different image features play different role. Some features may play a vital
role and there may be some features that may have no effect. This section first lists the various
features, combined with some image characteristics of pulmonary nodules.
62
3.3.2.1 Shape Features
The shape (2D & 3D) features of suspected pulmonary nodules are characteristic parameters
that represent the shape characteristics of nodules. They need to embody the characteristics of
nodule itself and the difference between the nodule and the vascular section and the branches
of the vessel. At the same time, the characteristics of nodule itself can vary in different image
sequences. The shape features are closely related to the lesions of the actual pulmonary nodules
and they can visually represent the nodules which is another kind of diagnostic reference
information for doctors in the process of diagnosis.
The morphology of the suspected pulmonary nodules is mainly from the set of the outline pixels
of the suspected pulmonary nodules and the range and changes of the contours can show the
lung pathology of the patients. In this dissertation, the morphological characteristics of
pulmonary nodules are expressed by selecting shape features which can reflect the specificity
of pulmonary nodule shape and have the properties of rotational invariance. This thesis
enumerates several shape features (2D and 3D) including area (where suspected lung nodule
area indicates the number of voxels in the median slice of lung nodule ), volume (where volume
represents the total number of voxels in the segmented pulmonary nodule), perimeter (where
suspected pulmonary nodule perimeter represents the number of voxels on the outline of a lung
nodule), image moments and central moments (to compute the shape information such as the
centroid and information about the orientation, it involves the moment feature in geometry
which is called geometric invariant because it has the properties of rotation, scale and
translational invariance. The gray-scale characteristics of the region are described by each order
moment of the gray distribution within the ROI shape region. Different classes represent
different meanings, the zero-order moment represents the quality of the region, the first-order
moment represents the coordinates of the regional centroid, the second-order moment
represents the orientation information of the region and so on), Centroid (center of the
63
suspected pulmonary nodule), major and minor axis length and elongation (Suspected lung
nodule length axis indicates the equivalent ellipse of the lung nodule area which corresponds
to the length axis and its proportion. If the ratio of the length axis is closer to one, it indicates
higher roundness and greater possibility that the part is the lung nodule and vice versa),
circularity (the macroscopic roundness of suspected pulmonary nodules represents the degree
of deviation between the region of the suspected pulmonary nodule and the circular region i.e.
the ratio of the radius of the inner circle to the radius of the circle. Where the radius of the inner
circle is equal to twice the area of the suspected pulmonary nodule divided by its circumference
and the circumscribed circle radius is equal to half of its equivalent elliptical long axis. If
circularity is closer to one, the more likely it is to be a pulmonary nodule and vice versa),
compactness (suspected pulmonary nodule compactness indicates that the volume of the
suspected pulmonary nodule is close to the roundness. It also indicates the smoothness of the
contour. If the compactness is closer to one, the boundary is smoother and if its value is smaller,
the edges are more complex and rough) [17,19,30,92,108]. The expressions of these shape
features are presented in 1st column of Table 3-1.
3.3.2.2 Intensity Features
Gray-scale (intensity) features are mainly the distribution and frequency of gray level in the
image matrix i.e. the value of each gray level and the number of occurrences on the lung
parenchyma image. It is the basic principle on which software development engineers can work
on to extract some important intensity features of suspected pulmonary nodules. Lung CT mage
itself is a large gray scale image in which the difference of gray values can reflect the
64
Shape Features Intensity Features Texture Features
Area [92]
𝐴 = ∑ 𝑜
𝑜 ∈ 𝑂𝑚
Elongation
[18]
𝐸 =𝑎
𝑏
Mean [57] �� =∑ 𝑥𝑖
𝑛𝑖=1
𝑛
Normalized
GLCM [125]
𝑃∧
𝛿(𝑖, 𝑗) =𝑃𝛿(𝑖, 𝑗)
∑ ∑ 𝑃𝛿(𝑖, 𝑗)𝐿−1𝑗
𝐿−1𝑖
Image
Moments
[17]
𝑚𝑝𝑞 = ∑ ∑ 𝑥𝑝𝑦𝑞𝑓(𝑥, 𝑦)
𝑦𝑥
Perimeter
[108]
𝐿(𝐼)
= ∑ 𝐼(𝑥, 𝑦)
(𝑥,𝑦)∈𝐶
Variance [57]
𝑆2
=∑ (𝑥𝑖 − ��2)2𝑛
𝑖=1
𝑛 − 1
Energy [126]
𝑒𝑛𝑒 = ∑ ∑ 𝑃𝛿2
∧
(𝑖, 𝑗)
𝐿−1
𝑗=0
𝐿−1
𝑖=0
Central
Moments
[17]
𝜇𝑝𝑞 = ∑ ∑(𝑥 − 𝑥0)𝑝(𝑦
𝑦𝑥
− 𝑦0)𝑞𝑓(𝑥, 𝑦)
Circularity
[108]
𝐶 =4𝜋𝐴
𝐿2
Maximum
Value Inside
[92]
𝐼𝑚𝑎𝑥 = 𝑚𝑎𝑥(𝐼)
Entropy [125]
𝑒𝑛𝑡
= − ∑ ∑ 𝑃𝛿
∧
(𝑖, 𝑗)log𝑃𝛿
∧
(𝑖, 𝑗)
𝐿−1
𝑗=0
𝐿−1
𝑖=0
65
Table 3-1: Extracted features of nodule candidates.0
Centroid
[17] 𝑥0 = 𝑚10/𝑚00 , 𝑦0 = 𝑚01/𝑚00
Roundness
[108]
𝑅 = 4𝐴
𝜋𝐿2
Minimum
Value Inside
[92]
𝐼𝑚𝑖𝑛 = 𝑚𝑖𝑛(𝐼)
Inverse Difference
Moment [125]
𝑖𝑑𝑚 = ∑ ∑𝑃𝛿
∧
(𝑖, 𝑗)
1 + (𝑖 − 𝑗)2
𝐿−1
𝑗=0
𝐿−1
𝑖=0
Major Axis
Length [30]
𝑎
= 2[2(𝜇20 + 𝜇02 + √(𝜇20 − 𝜇02)2 + 4𝜇11
2 )
𝜇00
]1/2
Volume
[18]
𝑉𝑜𝑙 = ∑ 𝑜
𝑜 ∈ 𝑂
Skewness [57]
𝑆𝑘𝑒𝑤
=∑ (xi − x)3𝑛
𝑖=1
(n − 1)3
Contrast [125]
𝑐𝑜𝑛
= ∑ 𝑛2
𝐿−1
𝑛=0
{∑ ∑ 𝑃𝛿Λ(𝑖, 𝑗)
𝐿−1
𝑗=0
𝐿−1
𝑖=0
}
Minor Axis
Length [30]
𝑏
= 2[2(𝜇20 + 𝜇02 − √(𝜇20 − 𝜇02)2 + 4𝜇11
2 )
𝜇00
]1/2
Compactne
ss [18]
𝐶𝑚𝑝 = 𝑉𝑜𝑙
43
𝜋𝑟3
Kurtosis [57]
𝐾𝑢𝑟𝑡
=∑ (xi − x)4𝑛
𝑖=1
(n − 1)s4
66
characteristics of different organizations, such as high gray level may be pulmonary nodules
and vascular tissue. Gray-scale features are the most direct features of CT images and are quite
effective [57,92]. The expressions of the extracted intensity features are presented in 2nd
column of Table 3-1.
3.3.2.3 Texture Features
Texture features are statistical information about the spatial distribution of pixel gray values in
images. Texture features are based on locale statistics that contain multiple pixel information.
The characteristic of texture is better than the characteristics of the gray-scale and the position
of the region. In the matching of regional objects, the feature matching detection ability is
stronger with texture features. At the same time, as a statistical feature, the texture feature has
a rotational invariance and a strong resistance to noise.
The model of texture features of CT images can also be used to detect the benign and malignant
pulmonary nodules (Though it is beyond the scope of this dissertation). It is not only possible
to determine the role of pulmonary nodules but also to predict the function of pulmonary
nodules. In the process of detecting pulmonary nodules, doctors can intuitively judge the degree
of light and shade of nodules (i.e., the expression of gray values) and the shape of suspected
pulmonary nodules and this regional qualitative pathologic feature is very important and
meaningful for physician-assisted diagnosis. Texture features can be used to measure the
smoothness and regularity of an image's appearance and this kind of characteristic values can
be used in the auxiliary diagnosis system.
The basic matrix applied in the texture feature calculation is a gray-level co-occurrence matrix.
The gray-level co-occurrence matrix can reflect the spatial information of gray-scale statistical
information and gray-scale distribution [125]. Four texture parameters, such as energy,
67
contrast, entropy and inverse difference moment are calculated through the normalized gray-
level co-occurrence matrix [125,126]. The expressions of these are presented in third column
of Table 3-1, where ( , )P i j represents gray-level co-occurrence matrix, i and j represent an
image gray level, L represents the maximum gray level of the image and represents the
spatial position relationship between two pixels. Because the gray-level co-occurrence matrix
is a regional performance property, the selected directional angles are 0°, 45°, 90° and 135°
while the pixel distance selection equals 2. A brief description of the extracted texture
features is presented in the following section.
Energy indicates the degree of uniformity of the gray distribution of the image pixels and the
degree of texture thickness. It is the sum of squares of the gray level co-occurrence matrix
values. If suspected pulmonary nodule region gray value changes in the smaller quantity, this
value is smaller, otherwise, if some of the values in nodule region differ from the others, the
value is larger. This value can indicate the uniformity of the region and the texture thickness
pattern.
Contrast indicates the sharpness of the grayscale image and the depth of the texture groove. If
texture groove is deeper, we will have higher value of contrast and the effect would be clearer
otherwise the contrast would be small with shallow groove and the effect would be blurred.
Contrast will have smaller value if the gray-level co-occurrence matrix consists of more similar
gray values and if we have larger element values away from the diagonal then the contrast
would be greater.
Entropy represents the measure of the amount of information the image has. If gray-level co-
occurrence matrix summarizes all the elements with maximum randomness, then all the values
in gray-level co-occurrence matrix are basically equal. The more decentralized distribution of
68
elements in the gray-level co-occurrence matrix indicates larger entropy. It can indicate the
complexity of the texture in the image.
Inverse difference moment represents the homogeneity of the image texture and measures the
degree of local changes in image texture. Larger value of inverse difference moment indicates
that there is no change between the different areas of the image texture and the local is very
uniform.
3.3.3 Pulmonary Nodules Feature Selection
There are hundreds of suspected pulmonary nodules features reported in literature. It is of
critical importance to select the descriptive features that has a considerable effect on the
detection efficiency of the system. Some features have no effect while some features play very
important role in the auxiliary diagnosis so the optimization of the feature set is a key issue.
We have selected hybrid set of lung nodule features, which has been achieved after
experimentation and correlation analysis [127]. The degree of correlation of the features of
similar basic elements must be greater than those of the different features. Therefore, the
correlation of the features belonging to one of the three main feature types must be relatively
high. The correlation of features is mainly the degree of association of the two features and the
quantization of the correlation degree of features is beneficial to the computer processing.
Correlation can be divided into linear correlation and nonlinear correlation. This thesis mainly
considers linear correlation. There are many kinds of statistical methods in which one of the
most commonly used methods is Pearson’s Product-Moment correlation. In this thesis, the
correlation between features is computed using this method. Correlation coefficient of any two
samples A and B can be calculated as [127]:
69
(3.21)
where 𝑎 and �� represents the means of A and B, respectively. This dissertation collects and
collates the characteristic information of suspected pulmonary nodules and extracts some
samples from them for correlation detection as shown in Table 3-2.
Table 3-2: Feature Correlation Information
Correlation Variance Area Major
axis
Circulari
ty
Compac
tness Energy Contrast Entropy IDM
Variance 1.000 0.598 0.665 -0.512 -0.471 -0.326 -0.072 0.372 -0.175
Area 0.598 1.000 0.951 -0.385 -0.397 -0.365 -0.055 0.484 -0.287
Major axis 0.665 0.951 1.000 -0.569 -0.571 -0.429 -0.124 0.531 -0.191
Circularity
-0.512 -0.385 -0.569 1.000 0.975 0.229 -0.105 -0.228 0.031
Compactne
ss -0.471 -0.397 -0.571 0.975 1.000 0.134 -0.155 -0.142 0.113
Energy -0.326 -0.365 -0.429 0.229 0.134 1.000 0.043 -0.964 -0.281
Contrast -0.072 -0.055 -0.124 -0.105 -0.155 0.043 1.000 -0.068 -0.161
2 2
( )( )
( ) ( )
i i
i
AB
i i
i i
a a b b
ra a b b
70
From Table 3-2, it can be shown that the range of the characteristic correlation value is in the
range of [-1,1]. The closer the value is to 0, the less relevant the two features are. Which can
be seen that circularity and compactness are highly correlated, while other features are
relatively low.
After correlation analysis, rigorous experimentation was done in selection of feature set which
gave the optimum results in classification of lung nodule candidates. Our approach was to
select a large initial set of features that represents the state of the art in features utilized by the
most successful published CAD systems. From this initial pool, we carried out feature selection
and trimmed down the feature set to the optimal subset for nodule detection considering both
the sensitivity and the FP/scan. We can broadly classify the extracted nodule features into:
shape, intensity, and texture related quantities as shown in Table 3-1. These features were
extracted from all the lung nodule candidates and used for classification.
3.4 Classification of pulmonary nodules
3.4.1 Support Vector Machine Classifier
Once the feature vectors have been formed, they are used as an input for classification and false
positive reduction. In our proposed method, we have used SVM classifier as it is
computationally efficient and gives better results [128,129]. Though we have experimented
with some other supervised classifiers like K-Nearest-Neighbour (KNN) [130], Decision Tree
[131] and Linear Discriminant Analysis (LDA) [132] but results clearly indicate the superiority
of SVM as compared to other classifiers. SVM algorithm is used to convert linearly indivisible
Entropy 0.372 0.484 0.531 -0.228 -0.142 -0.964 -0.068 1.000 0.387
IDM -0.175 -0.287 -0.191 0.031 0.113 -0.281 -0.161 0.387 1.000
71
high-dimensional space into linearly separable hyperplane algorithm and then classify true and
false pulmonary nodules through hyperplane. The main reason due to which we have preferred
SVM algorithm in this dissertation is that the nature of information of pulmonary nodules is
binary and SVM performs considerably well when there are only two classes to predict. SVM
can also efficiently perform non-linear classification using kernel trick [128]. There are four
main types of kernel functions such as:
(1) Linear kernel function can be expressed as:
( , ) T
i j i jK x x x x (3.22)
(2) Polynomial kernel function can be expressed as:
( , ) ( )T d
i j i jK x x x x r (3.23)
(3) Gaussian Radial Basis Function (RBF) kernel function can be expressed as:
2
( , ) exp( )i j i jK x x x x (3.24)
(4) Hyperbolic Tangent kernel function can be expressed as:
( , ) tanh( )T
i j i jK x x x x r (3.25)
Where γ, r and d are kernel function parameters. In four kernel functions, the RBF kernel
function performs considerably well because it can map the low-dimensional indivisible space
to high-dimensional separable state. The RBF kernel function SVM needs to calculate the
penalty factor and the kernel function parameter. The RBF kernel function is simpler than other
kernel functions and it has better effect on the irregular classification information of the lung
nodules.
72
In this dissertation, we have used a polynomial and a radial basis function as kernel functions.
The penalty factor and kernel scale parameters have been optimized using grid search. An
exhaustive grid search has been used to select these parameters where the range of penalty
factor and kernel is selected as C = 100, ……, 102 and γ = 2-3, ……, 23 respectively2. The
interval between the two consecutive values of penalty factor and kernel is set as 1 and 0.2
respectively. Different pair of (C, γ) values are tried and the one with best cross-validation
accuracy is picked [133,134]. This optimized pair of parameter is then used to train the
classifier using only training data.
To train the classifier, we use the annotated data from the radiologists. Normally the number
of nodule samples are much less than the number of non-nodules, affecting the performance of
a classifier. To remove this biasness, we have balanced our dataset by selecting the equal
number of nodules and non-nodules randomly. Next, the balanced dataset is randomly split into
training and testing datasets. More specifically, 70% of the data is used for training and 30%
of the data is held out as a test set for the final evaluation of the system. In training phase, we
have used k-fold cross-validation scheme for model selection and validation whereas the k
value varies for 5, 7 and 10. In k-fold cross-validation scheme, training data set is randomly
divided into k-equal sized sub-samples. Then from those samples, one sample is selected as
validation data for model assessment and remaining 𝑘 − 1 samples are used for training the
classifier. This process is repeated 𝑘-times. The 𝑘 results from the folds are then averaged to
produce a single estimation. The advantage of this scheme is that each sample is used for
training and validation purposes having each value used only once for validation.
In training phase, the input to the classifier consists of the feature vector and the known class
labels. Once the classifier is trained and its hyper-parameters are tuned, then the final evaluation
2 After the optimization of C and γ for SVM-Gaussian, the range for C and γ was reduced to C = 100, ……, 101
and γ = 2-3, ……, 22 to reduce the number of iterations.
73
of the classifier is done using the test set only. More specifically, 30% of the data held out
initially is used for final evaluation of the classifier and the corresponding results are reported
in next section. One thing to note that now the input to the classifier consists of only the feature
vector.
It is also noteworthy that feature selection was done using the training set only. Once we get
the optimal feature set for nodule detection considering both the sensitivity and the FP/scan
from training dataset, then we fix it and apply it to the test set.
The performance of a classifier can be calculated by the standard performance metrics mainly
sensitivity, specificity, accuracy and receiver operating characteristic curves (ROC curves)
[135]. ROC curves are obtained by plotting the sensitivity and false positive rate for different
threshold values. The area under the ROC curves summarizes the performance of the classifier.
These metrics can be calculated as follows:
𝑠𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 =𝑇𝑃
𝑇𝑃 + 𝐹𝑁 (3.26)
𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 =𝑇𝑁
𝑇𝑁 + 𝐹𝑃 (3.27)
𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =𝑇𝑃 + 𝑇𝑁
𝑇𝑃 + 𝐹𝑃 + 𝑇𝑁 + 𝐹𝑁 (3.28)
Where TP, TN, FP, and FN denote true positive, negative, and false positive and negative
labels. In summary, the main steps involved in classification stage are (1) Formation of feature
vector to be served as input to the classifier in training phase. (2) Optimization of penalty factor
and kernel scale parameters using Grid Search (3) Balancing of the dataset by selecting the
equal number of nodules and non-nodules randomly and splitting it into training and test
datasets (4) Training of classifier using the k-fold cross validation scheme for model selection
and validation using the training data only. (5) Testing of trained classifier to obtain the final
74
classification results using the test data only held out initially. This process is shown in Figure
3-8.
Rule based Classifier
SVM ClassifierClassification
ResultsFeature
OptimizationFeature Vector
Figure 3-8 Flow Chart of Classification Process
3.4.1.1 Sample Experimental Results
We have also represented three different classification cases in Figure 3-9, where (a), (d) and
(g) represents the original sequence, (b), (e) and (h) indicates that the system marks the
pulmonary nodule image, and (c), (f) and (i) represents the annotation of one of the four expert
radiologists in the pulmonary nodule xml file
(a) (b) (c)
(d) (e) (f)
75
(g) (h) (i)
Figure 3-9: Pulmonary Nodule Classification Results
(Each radiologist annotates the contour information differently). Where (c) shows annotation
of a pulmonary nodule with a diameter of about 3 mm, (f) shows annotation of a diameter of
more than 10 mm pulmonary nodules and (i) shows annotation of pulmonary nodules with a
very small diameter. According to the comparison of the image effect, the annotation in (c) is
basically consistent with the standard condition, whereas the annotation profile is slightly
different in (f) and the annotation is inconsistent in (i).
76
Chapter 4: RESULTS AND DISCUSSION
This chapter presents the results of our proposed method and the following discussion. A brief
analysis of our used dataset LIDC and the image format DICOM is presented in the first section
of this chapter. The next section presents the experimental environment including the graphical
user interface developed for user along with the results and corresponding analysis of all the
modules of the experimental environment. The results of our proposed method and the
comparison with other CAD systems is presented in the last section.
4.1 Dataset and Hit Criteria
We have done an extensive evaluation of our proposed system on Lung Image Database
Consortium (LIDC) [27]. LIDC is a publicly available database accessible from The Cancer
Imaging Archive (TCIA). We have considered the 850 scans3 (LIDC-IDRI-0001 to LIDC-
IDRI-0844) of this dataset, which contains nodules of size 3-30 mm fully annotated by four
expert radiologists in two consecutive sessions. The total number of nodules in 850 CT scans
is 2242. Each CT scan consists of 150-300 slices where each slice is of size 512*512 and 4096
gray level values in HU. The pixel spacing is 0.78 mm – 1 mm and reconstruction interval
varies from 1-3 mm.
We have considered all the nodules (i.e. 2242) including all the agreement levels (AL) among
the observers in evaluation of our proposed system. These nodules consist of the group of
nodules which have been marked by all four radiologists (full agreement between observers,
denoted by AL4) and the group of nodules marked by three out of four (majority agreement
between observers, denoted by AL3) and the group of nodules marked by two or only one
radiologist (minority agreement between observers, denoted by AL2 and AL1 respectively).
3 The case no. LIDC-IDRI-0132,0151,0315,0332,0355,0365,0442,0484 appear twice as distinct cases in the dataset and
cases with IDs. LIDC-IDRI-0238,0585 do not exist in the dataset.
77
In our evaluation, we have considered each detected nodule as a nodule if its distance to any of
the nodule in the dataset is smaller than 1.5 times the radius of that nodule. This value is
achieved by experiments and it has been used in some other studies as well [12,18,30]. We call
it a near hit. If a hit has been made on a detected nodule we call it as true positive otherwise it
is called false positive. If there are multiple hits for a same reference nodule then only the
candidate which overlaps the most with the reference nodule’s merged manual segmentation is
counted as true positive and other candidates are ignored i.e. they are not counted as true
positives or false positives for scoring purposes.
4.1.1 DICOM Resources
The LIDC scans are in the DICOM (Digital Imaging and Communications in Medicine) format
[14]. DICOM is a widely used medical imaging standard across the globe. We have analyzed
the characteristics of DICOM sequence imaging (Detail can be found in Appendix A) which is
another important aspect and lays a foundation for the research of our proposed algorithm.
With the progress and development of science and technology, the investment of developed
countries in the world has increased in the two major areas of education and health care.
However, the level of health care is the most important, reflecting the national state of science
and technology. In past, medical equipment manufacturers used to face various problems, due
to their different image formats as well as the non-uniformity of equipment interface, such as
communication between devices, image sharing issues, medical development continuity issues
and so on. These problems lead to the imbalance and waste of medical resources.
As a result, American College of Radiology (ACR) and National Electrical Manufacturers
Association (NEMA) formed a committee and ultimately developed a medical image storage,
display and transmission’s unified standard “DICOM” which describes in detail the interaction
modes, medical image formats, and communication fundamentals among different
78
manufacturers. The coding structure of the DICOM-CT image and analysis of XML
information file which LIDC provides as a reference can be found in Appendix A and B
respectively.
4.2 Experimental Environment
We have developed a fully automated system for lung nodule detection. The user interface of
the developed system is shown in Figure 4-1. The developed system has five main modules
and some supplementary functions for user support. In the following section, we have described
each of them.
Figure 4-1 User Interface of Lung CAD
4.2.1 Image Preprocessing Module
The image preprocessing module includes the image denoising function as shown in Figure 4-
2 in which we have used the median filtering as the image filtering method. This module
79
removes any present noise in the image and the output of this module is served as an input for
the next module.
Figure 4-2 Sample Output of Image Pre-processing Module
4.2.2 Lung Segmentation Module
The lung segmentation module shown in Figure 4-3 consists of the two main functions (i) Lung
Figure 4-3 Sample Output of Lung Segmentation Module
80
segmentation and (ii) Edge smoothing. Lung segmentation function consists of series of steps
described earlier in detail and the output of this function is the segmented lung region from the
pre-processed CT image. The second function in this module smoothes the lung contour using
rolling ball algorithm to include any juxta-pleural nodule present on the lung contours.
4.2.3 Image Enhancement Module
The image enhancement module shown in Figure 4-4 consists of the main function namely
“Multi-scale Dot Enhancement Filter”. This function takes the output of lung segmentation
module as an input and uses our proposed multi-scale enhancement filter to enhance the low-
density nodules by increasing the overall contrast of the region of interests (ROI) and
weakening other irrelevant structures present in the lung region.
Figure 4-4 Sample Output of Image Enhancement Module
In the following paragraphs, we present sample results and discussion of image pre-processing,
lung segmentation and enhancement module. These samples are taken from different scans of
LIDC-IDRI dataset used in the dissertation. Figure 4-5 presents the top, middle and bottom
slices of different scans. The first column of Figure 4-5 presents the results of image
81
preprocessing module on these different slices. The preprocessing module is responsible for
grayscale conversion of the input image and removal of any present noise in the image through
median filtering. The (a), (d), (g) and (j) of Figure 4-5 represents the image pre-processing
results on the middle slices of different scans while (m), (p) of Figure 4-5 represents the image
pre-processing results on the top slices of different scans and (s), (v), (y) of Fig 4-5 represents
the image pre-processing results on the bottom slices of different scans respectively. It can be
seen from these results that image pre-processing module removes any present noise efficiently
from the input CT images.
The second column of Figure 4-5 represents the results of lung segmentation module on these
different slices. The lung segmentation module is responsible for segmenting the lung from
pre-processed input image and consists of several operations including thresholding, back
ground removal, lung contour smoothing and lung separation. The detail of these steps is
explained in the previous chapter. We have presented the final segmented lung which is the
main output of this module. The (b), (e), (h) and (k) of Figure 4-5 represents the lung
segmentation results on the middle slices of different scans while (n), (q) of Figure 4-5
represents the lung segmentation results on the top slices of different scans and (t), (w), (z) of
Fig 4-5 represents the lung segmentation results on the bottom slices of different scans
respectively. From the results, it can be seen that segmentation module works pretty well for
top and bottom slices as shown in (n), (q) and (t), (w), (z) of Figure 4-5 respectively. For middle
slices, we have presented two scenarios (i) First scenario in which the segmentation module
segments the lung without any artifacts (ii) Second scenario in which the segmentation module
segments the lung with some artifacts. The (b) of Figure 4-5 represents a segmented lung as an
output of lung segmentation module. Close examination of this output in (b) with the pre-
processed image in (a) of Figure 4-5 reveals that there are almost no artifacts involved in the
segmented lung. Similar situation is presented in (h) where no artifacts were found in the
82
segmented lung when compared to the pre-processed image in (g) of Figure 4-5. The second
scenario is presented in (e) of Figure 4-5 which represents a segmented lung as an output of
lung segmentation module. Close examination of this output in (e) with the pre-processed
image in (d) of Figure 4-5 reveals that there are some artifacts involved in the segmented lung.
On the middle of the right side of left segmented lung, a minute closing is observed between
the two disjoint regions. This artifact is highlighted with the help of a red circle in the figure.
Similar situation is presented in (k) where close examination of this output in (k) with the pre-
processed image in (j) of Figure 4-5 reveals the same artifact involved in the segmented lung
which is highlighted with the help of a red circle in the figure.
The third column of Figure 4-5 presents the results of image enhancement module on these
different slices. The image enhancement module is responsible for enhancing the low-density
nodules which may remain undetected without proper enhancement and can affect the
performance of the system. The image enhancement method takes into the consideration that
nodules exhibit the spherical nature and vessels are line like elongated structures therefore this
module enhances the dot like spherical objects. The (c), (f), (i) and (l) of Figure 4-5 represents
the image enhancement results on the middle slices of different scans while (o), (r) of Figure
4-5 represents the image enhancement results on the top slices of different scans and (u), (x),
(zz) of Fig 4-5 represents the image enhancement results on the bottom slices of different scans
respectively. It can be seen that image enhancement module works well with almost all the
cases irrespective of the different scans with different imaging protocols resulting different
density levels and irrespective of the slice orientation. We have presented one case in (zz) of
Figure 4-5 where the result of image enhancement module is not desirable in a specific portion
of a bottom slice which is highlighted in figure.
83
(a) (b) (c)
(d) (e) (f)
(g) (h) (i)
Fig. 4-5 Sample Results of Image Preprocessing, Segmentation and Enhancement Modules
on Top, Bottom and Middle Slices of Different Scans.
84
(j) (k) (l)
(m) (n) (o)
(p) (q) (r)
Fig. 4-5 (cont’d) Sample Results of Image Preprocessing, Segmentation and Enhancement
Modules on Top, Bottom And Middle Slices of Different Scans.
85
(s) (t) (u)
(v) (w) (x)
(y) (z) (zz)
Fig. 4-5 (cont’d) Sample Results of Image Preprocessing, Segmentation and Enhancement
Modules on Top, Bottom And Middle Slices of Different Scans.
86
4.2.4 Lung Nodule Segmentation Module
The “ROI Extract” module is responsible for candidate lung nodule detection and feature
extraction which are served as an output for next module. This module takes the enhanced
image as an input and detects the lung nodule candidates using optimal thresholding followed
by a rule-based analysis which only selects the good nodule candidates. Finally, the features
are extracted from good nodule candidates and served as input to the next module.
4.2.5 Lung Nodule Classification Module
This module serves as an output of the lung CAD system. This module takes the extracted
features as an input and train the classifier with the training data and gives the classification
results on test data.
4.2.6 Supplementary Functions Module
We have also developed some supplementary functions apart from the main modules to
increase the interactivity and support to the user. These functions are placed on the top center
and bottom of the left as shown in Figure 4-1. The top center function provides the input and
output image functionalities like the image sequencing, the coordinate information in the output
image and the options to read, write and reset the image. The functions placed on the bottom
left provides the facility to go to the very last or first slice of the CT sequence. There are also
two specialized functions namely “Broadcast” and “Automatic”. The former function provides
the facility of scanning through all the images with one click while the latter shows the final
output of CAD system in one click.
4.3 Classification Results of SVM with Different Kernel Functions
Our system detects 2112 nodules with 38682 non-nodules, which gives the detection rate of
94.20 % with 45.51 % FP/scan. Note that these non-nodules have been further reduced by the
use of a classifier at the classification stage. Results have been summarized in the following
87
tables. Table. 4-1 shows the classification results of SVM with different kernel functions on
test dataset while using 2, 5 and 7-fold cross validation schemes in training phase. It is to note
that penalty factor and kernel parameters of these models have been optimized using grid
search. For SVM-Gaussian, the pair (C=1 and 𝛾= 0.125) achieved maximum cross-validation
accuracy as shown in Figure 4-6 and was used to train the model, while for SVM-Cubic, the
pair (C=10 and 𝛾= 1.325) achieved maximum cross-validation accuracy as shown Figure 4-7
and was used to train the model. Lastly, for SVM-Quadratic, the pair (C=9 and 𝛾= 0.525)
achieving maximum cross-validation accuracy was selected to train the model. The optimized
pair is shown in Figure 4-8.
Table 4-1: Classification Results of SVM on test dataset with different kernel functions using
2, 5 and 7-fold Cross Validation Scheme in training phase.
k-fold Classifier AUC Accuracy
(%)
Sensitivity
(%)
Specificity
(%) FPs/Scan
2-Fold
SVM-
Gaussian 0.995 97.10 98.15 96.01 2.19
SVM-
Cubic 0.943 90.10 92.12 88.63 3.50
SVM-
Quadratic 0.907 83.40 80.21 85.73 4.27
5-Fold
SVM-
Gaussian 0.995 97.40 98.32 96.46 1.88
SVM-
Cubic 0.949 90.10 92.28 88.31 3.36
SVM-
Quadratic 0.916 83.80 80.90 86.16 3.98
88
Figure 4-6: Grid Search Results for SVM -Gaussian showing the pair of parameters (C=1 and
𝛾= 0.125) with best cross validation accuracy.
7-Fold
SVM-
Gaussian 0.994 97.40 98.41 96.40 1.91
SVM-
Cubic 0.955 90.90 92.67 89.38 3.11
SVM-
Quadratic 0.919 83.20 80.29 85.59 3.76
89
Figure 4-7: Grid Search Results for SVM-Cubic showing the pair of parameters (C=10 and
𝛾= 1.325) with best cross validation accuracy.
Figure 4-8: Grid Search Results for SVM-Quadratic showing the pair of parameters (C=9 and
𝛾= 0.525) with best cross validation accuracy
90
Our system has achieved a sensitivity of 98.41 % and an accuracy of 97.40 % using SVM with
Gaussian kernel function. It can be seen that Gaussian kernel function outperforms other kernel
functions regarding the accuracy of the system and 7-fold cross validation scheme yields the
maximum accuracy. The performance of the system with Gaussian kernel function remains
almost the same in 2 and 5-fold cross validation schemes with a slight difference in metrics.
Other two kernel functions, SVM-Cubic and Quadratic achieve the highest sensitivities of
92.67 % and 80.90 % respectively.
(a) (b)
(c)
Figure 4-9: ROC curves of the SVM classifier with different kernel function using (a) 2-Fold
Scheme, (b) 5-Fold Scheme (c) 7-Fold Scheme SVM-Q: Quadratic kernel function, SVM-G:
Gaussian kernel function, SVM-C: Cubic kernel function.
91
ROCs curves have been drawn to visualize the classifier’s performance. Figure 4-9 shows the
ROCs curves for SVM classifier with different kernel functions using 2, 5 and 7-fold cross
validation scheme, respectively. The confidence interval for these and subsequent curves was
set to 95%. It can be seen that SVM Gaussian kernel function outperforms the other two kernel
functions while SVM Quadratic function shows the lowest performance.
4.4 Classification Results of SVM with Different Kernel Scale and Penalty
Factor
In addition to performing grid search for the selection of (C, 𝛾 ), we have also experimented
with different values of kernel scale and penalty factor while keeping one of them constant to
observe the effect of these parameters. Table 4-2 shows the classification results of SVM-
Gaussian using different kernel scale values in 2-fold cross validation scheme. We have
evaluated our system using different values of kernel scale between the range 0.3 to 3. The
penalty parameter has been kept constant with a value of 1. It can be seen that the performance
of the system decreases with the increasing value of scale after achieving the maximum
accuracy at initial value of 𝛾=0.3. The system achieves a lowest accuracy of 83.30 % for a
value of 𝛾=3.
Table 4-2 Classification Results of SVM Gaussian on test dataset using different 𝛾 values
Penalty
Parameter
(C)
Kernel
scale ( 𝛾)
AUC Accuracy
(%)
Sensitivity
(%)
Specificity
(%)
FPs/scan
1 0.3 0.994 97.00 98.04 95.87 1.31
1 0.5 0.992 96.80 97.92 95.53 1.56
1 1 0.989 96.40 97.86 95.26 1.79
1 1.3 0.974 93.30 94.12 92.67 2.00
1 1.5 0.964 91.20 91.14 91.34 2.36
1 1.8 0.950 88.70 86.63 90.40 2.62
92
Table 4-3 shows the classification results of SVM-Gaussian using different penalty parameter
values in 2-fold cross validation scheme. The value of the penalty parameter used varies from
1 to 4. The value of kernel scale has been kept constant. It can be seen that the accuracy of the
system increases with the increasing value of penalty parameter and attains a maximum value
of 97.0 % for C=4.
Table 4-3: Classification Results of SVM-Gaussian on test dataset using different C values
and 2-fold Cross Validation Scheme in training phase.
Figure 4-10 (a) shows the ROCs curves for SVM classifier with Gaussian kernel function using
different kernel scale values in 2-fold cross validation scheme. The kernel scale value varies
from 0.3 to 3 by keeping the penalty parameter constant. It can be seen that the performance of
the classifier decreases with the increasing value of 𝛾. Figure 4-10 (b) shows the ROCs curves
for SVM classifier with Gaussian kernel function using different penalty parameter values in
2-fold cross validation scheme. The penalty parameter value varies from 1 to 4 by keeping the
kernel scale value constant. It can be seen that the performance of the classifier remains almost
the same with minor increase.
1 2 0.942 87.40 84.79 89.57 2.85
1 2.5 0.922 84.30 79.98 87.93 3.29
1 3 0.913 83.30 79.60 86.41 3.71
Penalty
Parameter
(C)
Kernel
scale ( 𝛾)
AUC Accuracy
(%)
Sensitivity
(%)
Specificity
(%)
FPs/scan
1 1 0.989 96.40 97.86 95.26 1.79
2 1 0.991 96.70 98.12 95.45 1.74
3 1 0.991 96.90 98.23 95.07 1.84
4 1 0.992 97.00 98.32 95.26 1.79
93
(a) (b)
Figure 4-10: ROC curves of the SVM classifier with Gaussian kernel function using 2-Fold
cross validation scheme with (a) different kernel scale 𝛾 values, varying from 0.3 to 3 (b)
with different penalty parameter C values varying from 1 to 4.
4.5 Classification Results of Different Classifiers
In addition to the SVM classifier, we have also evaluated our system using some other
supervised classifiers mainly K-Nearest-Neighbour, Decision Tree, Linear Discriminant and
Boosted Tree. Table 4-4 shows the classification results of these classifiers using 2-fold cross
validation scheme. It can be seen that Decision Tree shows better performance as compared to
other classifiers by achieving maximum accuracy and sensitivity while Linear Discriminant
performs poorly by achieving the lowest sensitivity.
Table 4-4: Classification Results of different classifiers on test dataset using 2-fold Cross
Validation Scheme in training phase.
Classifier AUC Accuracy
(%)
Sensitivity
(%)
Specificity
(%)
FPs/scan
Decision Tree 0.942 91.40 96.03 87.55 3.39
Linear
Discriminant
0.792 74.10 57.06 88.18 3.25
K-Nearest-
Neighbour
0.882 78.40 83.04 74.59 6.93
94
4.6 Feature Ranking
Various features have been proposed in literature to differentiate between nodules and other
anatomical structures but the research on measuring the effectiveness of these features have
been limited. In this dissertation, we have compared different classes of features to determine
the most relevant feature class. Table 4-5 shows the classification results of SVM-Gaussian
using different classes of features in 2-fold cross validation scheme. Features from class Shape
shows the maximum performance regarding sensitivity and accuracy of the system as
compared to other feature classes. But results clearly show that it is very difficult to achieve
high performance metrics using only a single class therefore hybrid approach in feature
selection remains a better choice.
Table 4-5: Classification Results of SVM-Gaussian on test dataset using different feature
classes and 2-fold Cross Validation Scheme in training phase.
Figure 4-11 (a) shows the ROCs curves for different feature classes using SVM classifier with
Gaussian kernel function in 2-fold cross validation scheme. It can be seen that features from
class Shape shows the maximum performance as compared to other two feature classes. Figure
4-11 (b) shows the ROCs curves for different classifiers in 2-fold cross validation scheme. It
Boosted Tree-
Ensemble
0.959 89.60 91.67 87.93 3.29
Features AUC Accuracy
(%)
Sensitivity
(%)
Specificity
(%)
FPs/scan
Intensity 0.780 72.40 68.53 76.49 6.13
Shape 0.902 84.60 80.59 87.87 3.62
Texture 0.835 76.40 71.89 80.09 5.43
95
is noteworthy that Linear Discriminant classifier performs poorly as compared to other
classifiers by having the lowest area under the curve.
(a) (b)
Figure 4-11: (a) ROC curves of SVM classifier with Gaussian kernel function using 2-Fold
cross validation scheme with different feature classes (b) ROC curves of different classifiers
using 2-Fold cross validation scheme.
4.7 Comparison with Other Systems
From the review of the existing methods, we found that it is very hard to compare the results
with the previously published work because of their use of non-uniform performance metrics
and different evaluation criteria including the dataset and types of nodules considered. Despite
of this constraint, we have tried to make a performance comparison of our proposed system
with the other Lung CAD systems as shown in Table 4-6. It can be seen that our proposed
system shows better performance as compared to other systems regarding sensitivity and
FP/scan. Other systems which are close in the performance are Choi et al. [12], Messay et al.
[92] and Akram et al. [57]. Choi et al. [12] proposed a novel shape-based feature extraction
method. Eigen value decomposition of Hessian matrix was done to obtain the surface elements
which could describe the local shape information of the target object and features were formed
from these surface elements. The system was evaluated by considering 148 nodules in 84 scans
96
of LIDC dataset. System shows good performance in terms of sensitivity by achieving a value
of 97.5 % but underperforms in terms of false positives by having a value of 6.76 FP/ scan.
Messay et al. [92] computed a detailed feature set consisting of 245 features (2D & 3D) mainly
belonging to feature classes of shape, intensity and gradient. A sequential forward selection
method was next applied to obtain the optimum feature subset. The system was evaluated using
LIDC dataset and considering 143 nodules. System shows good performance in terms of false
positives with a value of 3 FP/scan but underperforms in terms of sensitivity. Akram et al. [57]
computed the 2D shape features (Area, Diameter, Perimeter, Circularity), 3D shape features
(Volume, Compactness, Bounding Box Dimensions, Elongation, Principal Axis Length) and
2D and 3D intensity based statistical features (Mean inside, Mean outside, Variance inside,
Kurtosis inside, Skewness inside, Minimum value inside, Eigen values). The system was
evaluated using LIDC dataset. System shows good sensitivity having a value of 95.31 % but
the number of nodules used to validate the results is too small. Table 4-6 summarizes the
performance comparison of our proposed system with recently published lung CAD systems.
Table 4-6: Performance Comparison of Different CAD Systems, *N/A means Not Available.
CAD
Systems
Year
Data Set
Nodule
Size(m
m)
Number of
Nodules
Sensitivity
(%)
FPs/scan
Proposed
System
2018 LIDC 3-30 2242 98.15 2.19
Setio et al.
[111]
2016 LIDC 3-30 1186 90.1 4.00
Dou et al.
[136]
2017 LIDC 3-30 1186 90.7 4.00
Bergtholdt
et al. [137]
2016 LIDC 3-30 690 85.9 2.50
Akram et al.
[57]
2016 LIDC 3-30 50 95.31 N/A
Torres et al.
[138]
2015 LIDC 3-30 1749 80.00 8.00
97
Our system detected 2112 nodules out of 2242 total number of nodules with an agreement level
one (AL1) between the observers in 850 CT scans. The total number of false negative (missed
nodules by the system) are 130. We have evaluated the characteristics of these nodules with
respect to size, internal structure and subtlety. We have divided the missed nodules in four
categories with respect to size (i) nodules< 4 mm (ii) nodules 4 to 6 mm (iii) nodules 6 to 8
mm (iv) nodules > 8 mm. Our system missed 31 nodules out of 322 nodules less than 4 mm,
38 nodules out of 657 nodules ranging from 4 to 6 mm, 34 nodules out of 697 nodules with 6
to 8 mm and 27 nodules out of 566 nodules greater than 8 mm as shown in Figure 4-12.
van
Ginneken et
al. [110]
2015 LIDC 3-30 1147 78.00 4.00
Choi et al.
[12]
2014 LIDC 3-30 148 97.50 6.76
Teramoto et
al. [109]
2014 Private 4-30 103 83.00 5.00
Choi et al.
[18]
2013 LIDC 3-30 151 95.28 2.27
Tartar et al.
[108]
2013 Private 2-20 95 89.60 7.90
Orozco et
al. [107]
2013 LIDC,EL
CAP
2-30 75 84.00 7.00
Assefa et al.
[105]
2013 ELCAP N/A 165 81.00 35.15
Choi et al.
[30]
2012 LIDC 3-30 76 94.10 5.45
Messay et
al. [92]
2010 LIDC 3-30 143 82.66 3.00
Sousa et al.
[104]
2010 Private 3-40 33 84.84 N/A
98
Figure 4-12 Number of False Negatives with respect to Size
This shows that majority of the missed nodules (false negatives) were of diameter 6 mm or less
as shown in Figure 4-13. We also categorized the missed nodules with respect to texture
ranging from non-solid to solid nodules where a score of 1 was given to non-sloid (ground
glass nodule) and a score of 5 was given to solid nodules. We merged the ratings of radiologists
and defined a nodule as subsolid for which the average rating was less than 5. It was found that
31% of the false negatives were subsolid.
Figure 4-13 Percentage of False Negatives with respect to Size
99
We also categorized the false negative with respect to the subtlety ranging from a score of 1 to
5 where a score of 1 was given to extremely subtle and a score of 5 was given to obvious by
the radiologists. We merged the ratings of radiologists by averaging them and defined a nodule
as subtle with a score of less than 3. We found that 25% of the missed nodules were subtle.
The system’s overall performance in terms of detection sensitivity with respect to nodule size
is shown in Figure 4-14. It can be seen that the performance of the system increases with the
nodule size in terms of detection achieving a maximum value of 95.22% for nodules greater
than 8mm.
Figure 4-14 Detection Sensitivity with respect to Nodule Size
We have further investigated our system’s performance with agreement level 3 where there is
majority agreement between the observers. The number of nodules in this case were reduced
to 1160 from 2242 in 850 CT scans where the system’s performance remained almost the same
with a slight increase in detection sensitivity for the nodules < 4mm as shown in Figure 4-15.
With agreement level 3 (AL3) between the observers, the detection sensitivity of our proposed
system for nodules less than 4mm was 91.32% while it had the values of 94.32% and 94.68%
for nodules ranging between 4 to 6 mm and 6 to 8 mm respectively. The system achieved a
100
maximum sensitivity of 94.98% for nodules greater than 8 mm with an agreement level 3
between the observers.
.
Figure 4-15 Comparison of System’s Overall Performance w.r.t. different agreement levels
In the following section, we present some of the missed nodules (false negatives) by our system
as shown in Figure 4-16 and discuss the characteristics of the corresponding missed nodules.
(a) (b) (c)
(d) (e)
Figure 4-16 Sample Missed Nodules indicated by the red arrow (False Negative) by the
proposed system. Encircled objects in respective figures represent False Positive.
101
Figure 4-16 (a) shows a missed nodule (indicated with a red arrow) of an agreement level 4
where all the four radiologists marked it. The diameter of the missed nodule was approximately
7mm. We investigated different characteristics of the nodule scored on different scales by the
radiologists. The score of every characteristic from multiple radiologists was averaged to
produce a single value. The merged subtlety of the missed nodule shown in Figure 4-16 (a)
was 2 (subtlety was scored from 1= extremely subtle to 5= obvious). The internal structure of
the missed nodule was 1 (internal structure was scored on a scale of 1 to 4 where 1 represents
soft tissue, 2 represents fluid, 3 represents fat and 4 represents air). The internal calcification
of the missed nodule was 6 (calcification was scored from 1 to 6 where 1 denotes the popcorn,
2 represents the laminated, 3 represents the solid, 4 represents the non-central, 5 represents the
central and 6 represents the absence of calcification). The sphericity of the missed nodule which
gives the idea of the shape of the nodule with respect to roundness was 3 (the sphericity was
scored from 1 to 5 where 1 represents the linear appearance and 5 represents the round
appearance). The margin of the missed nodule which defines how accurately the nodule is
defined was 3 (where margin was scored from 1= poorly defined to 5=well defined). The
nodule spiculation of the missed nodule was 5 (where spiculation represents the description of
lung nodule and is more relevant as compared to lobulation in which the origin of the cancer
may not be primarily in the lung, that’s why we have not discussed lobulation in our analysis)
was scored on a scale of 1 to 5 where only the extreme values were defined with 1= no-
spiculation and 5= marked spiculation. The texture of the missed nodule was marked 5 where
texture was scored from 1 to 5 with only three values explicitly defined from 1= non-
solid/Ground Glass nodule, 3= part-solid and 5=solid. We have summarized the investigated
characteristics of the sample missed nodules present in Figure 4-16 in Table 4-7.
102
Table 4-7 Average Scores of Different Characteristics of Sample False Negatives
Nodule AL* D**(mm) Subtlety
Internal
Structure
Calcification Sphericity Margin Spiculation Texture
Fig4-16
(a)
04 7 2 1 6 3 3 5 5
Fig4-16
(b)
01 19 3 1 6 4 3 2 5
Fig4-16
(c)
02 6 2 1 6 4 1 1 1
Fig4-16
(d)
03 4 2 1 6 3 3 1 5
Fig4-16
(e)
01 4 5 1 3 5 5 1 5
*AL denotes Agreement Level between Observers, D** means Nodule Diameter
From close examination of these values mentioned in Table 4-7, we can mark the nodules (a),
(c), (d) of Figure 4-16 as “very subtle” having a merged subtlety value of 2 out of 5. Apart from
the absence of calcification which is common in these cases, it is observed that these sample
cases are not defined as well with an average margin rating of 3 (average) for (a)(d) and 1
(poorly defined) for (c). The score of (b) is in close resemblance to the cases discussed above
having a value of 3 for each subtlety and margin values whereas the sample shown in (e) of
Figure 4-16 shows a rare failure of the system. Figure 4-17 shows sample detected nodules
(True positive). From these samples, it can be seen that the proposed system performs well for
different nodule types having different sizes and texture.
103
Figure 4-17 Sample images of detected nodule (highlighted) by the proposed system (True
Positive). The arrow in respective figures indicates the False Positive detected by the system
along with True Positive.
Figure 4-18 shows the overall performance of our proposed CAD system by the free-response
ROC (FROC) curves [139] using SVM classifier with different kernel functions and 2-fold
cross validation scheme. The system shows robust and accurate performance in detecting
nodules.
104
Figure 4-18: FROC curves of the proposed system with respect to the different kernel
functions of SVM classifier.
105
Chapter 5: CONCLUSION AND FUTURE PROSPECT
5.1 Conclusion
A well performing CAD system contribute to the health provision by helping the expert
radiologist in the detection of lung cancer and by providing them with a second opinion. In this
dissertation, we have proposed a method with hybrid feature set for lung nodule detection. In
the pre-processing stage, the lung image has been thresholded using optimal thresholding,
followed by background removal, hole filling operations and lung segmentation. Then the
contour correction of the segmented lung fields has been made to include juxta-pleural nodules.
The candidate nodules have been detected and segmented simultaneously from an enhanced
image using multi scale dot enhancement filter. Shape, intensity and texture features have been
extracted from lung nodule candidates and used for false positive reduction using a SVM
classifier. The proposed system has been evaluated using the LIDC dataset and k-fold cross
validation. The achieved sensitivity is 98.15 % with 2.19 false positive per scan only.
In this thesis, we have used a hybrid feature set to improve the classification accuracy of the
system. Moreover, we have also made a comparison of feature classes which clearly indicate
that no single feature can detect the nodules with high precision. Thus, choosing right set of
features can improve the overall accuracy of the system by improving the sensitivity and
reducing false positives. We also experimented with different classifiers to assess the
performance of the system but results clearly show that SVM, with the flexibility of having
different kernel functions, remains a better choice as compared to other classifiers in terms of
accuracy.
5.2 Follow-up Work and Prospects
The computer-aided detection system of lung cancer still has a lot of research space. There has
been a lot of research in this particular area but to form a commercial product ready to be used
106
in hospitals, there is a need of development in technology and further research. Further work
and prospects found during this research are as follows:
1. The segmentation of suspected pulmonary nodules needs further research and development,
which requires the construction of a more complete template library for pulmonary nodules.
Recently, some researchers have carried out a large collection of nodule types and performed
the experiment of creating a pulmonary nodule template library but the main problem
encountered is the diversity of nodule characteristics.
2. The computer aided detection system should be tested on sufficiently large datasets to
achieve more robustness. Currently, the CAD systems are evaluated on relatively small datasets
so there is every chance that the performance of the system will be affected in real time clinical
tests. Through large experiments, we can prove the integrality and generality of the detection
system and form the products according to the market needs.
3. Selection of lung nodule feature set is another area which needs further research in terms of
new features that are more descriptive and can play a critical role in the classification phase.
Currently the number of features reported in literature that can be extracted are almost 150.
With an increase in the type and number of features, the method of feature selection can be
optimized and the characteristics and specific performance can be studied in depth and the
advantage of feature combination can be maximized. The optimization of feature set is also of
critical importance as one must take into consideration of the issues of overfitting.
4. The CAD system of lung cancer can include three-dimensional reconstruction and
processing function. With the rapid development of medical image processing technology,
three-dimensional reconstruction is an important part of image processing. Doctors need more
complete three-dimensional image products. Based on the reconstruction of three-dimensional
107
image and three-dimensional image segmentation, visualization of preoperative products based
on three-dimensional reconstruction require in-depth study and research.
5. Another future area which also needs to be focused is the detection of micro nodules (<
3mm). Future CAD systems should be able to detect all types of nodules (including micro
nodules) while maintaining the same precision in terms of sensitivity and reduced number of
FP/scan.
108
REFERENCES
[1] R. L. Siegel, K. D. Miller, and A. Jemal, “Cancer statistics, 2016,” CA. Cancer J. Clin.,
vol. 66, no. 1, pp. 7–30, Jan. 2016
[2] J. M. Diaz, R. C. Pinon, and G. Solano, “Lung cancer classification using genetic
algorithm to optimize prediction models,” in IISA 2014, The 5th International
Conference on Information, Intelligence, Systems and Applications, 2014, pp. 1–6.
[3] A. B. Mariotto, K. Robin Yabroff, Y. Shao, E. J. Feuer, and M. L. Brown, “Projections
of the Cost of Cancer Care in the United States: 2010-2020,” JNCI J. Natl. Cancer Inst.,
vol. 103, no. 2, pp. 117–128, Jan. 2011.
[4] Howlader, N., A. M. Noone, M. Krapcho, D. Miller, K. Bishop, S. F. Altekruse, C. L.
Kosary et al. "SEER Cancer Statistics Review, 1975-2013, National Cancer Institute.
Bethesda, MD." 2016-02-16]. http://seer. cancer. gov/csr 2016.
[5] C. Tiwari, K. Beyer, and G. Rushton, “The Impact of Data Suppression on Local
Mortality Rates: The Case of CDC WONDER,” Am. J. Public Health, vol. 104, no. 8,
pp. 1386–1388, Aug. 2014.
[6] M. A. Moore, P. Attasara, T. Khuhaprema, T. N. Le, T. H. N. Nguyen, P. P. Raingsey,
S. Sriamporn, H. Sriplung, P. Srivanatanakul, D. T. Bui, S. Wiangnon, and T. Sobue,
“Cancer epidemiology in mainland South-East Asia - past, present and future.,” Asian
Pac. J. Cancer Prev., vol. 11 Suppl 2, pp. 67–80, 2010.
[7] Ferlay, J., I. Soerjomataram, M. Ervik, R. Dikshit, S. Eser, C. Mathers, M. Rebelo, D.
M. Parkin, D. Forman, and F. Bray. "Lyon, France: International Agency for Research
109
on Cancer; 2013." Cancer Incidence and Mortality Worldwide: IARC CancerBase, vol.
10, 2008.
[8] Forman, D., J. Ferlay, B. W. Stewart, and C. P. Wild. "The global and regional burden
of cancer." World cancer report 2014 (2014): 16-53.
[9] M. Luqman, M. M. Javed, S. Daud, N. Raheem, J. Ahmad, and A.-U.-H. Khan, “Risk
factors for lung cancer in the Pakistani population.,” Asian Pac. J. Cancer Prev., vol.
15, no. 7, pp. 3035–9, 2014.
[10] “FACT SHEET: Investing in the National Cancer Moonshot.” Office of the Press
Secretary, The White House. Published 01 Feb 2016. Accessed 01 Aug 2016.
<https://www.whitehouse.gov/the-press-office/2016/02/01/fact-sheet-investing-
national-cancer-moonshot>.
[11] T. N. L. S. T. R. Team, “Reduced Lung-Cancer Mortality with Low-Dose Computed
Tomographic Screening,” N. Engl. J. Med., vol. 365, no. 5, pp. 395–409, Aug. 2011.
[12] W. J. Choi and T. S. Choi, “Automated pulmonary nodule detection based on three-
dimensional shape-based feature descriptor,” Comput. Methods Programs Biomed., vol.
113, no. 1, pp. 37–54, 2014.
[13] I. R. S. Valente, P. C. Cortez, E. C. Neto, J. M. Soares, V. H. C. de Albuquerque, and J.
M. R. S. Tavares, “Automatic 3D pulmonary nodule detection in CT images: A survey,”
Comput. Methods Programs Biomed., vol. 124, pp. 91–107, 2015.
[14] W. J. Kostis, A. P. Reeves, D. F. Yankelevitz, and C. I. Henschke, “Three-Dimensional
Segmentation and Growth-Rate Estimation of Small Pulmonary Nodules in Helical CT
Images,” IEEE Trans. Med. Imaging, vol. 22, no. 10, pp. 1259–1274, 2003.
110
[15] C. I. Henschke, D. I. McCauley, D. F. Yankelevitz, D. P. Naidich, G. McGuinness, O.
S. Miettinen, D. M. Libby, M. W. Pasmantier, J. Koizumi, N. K. Altorki, and J. P. Smith,
“Early Lung Cancer Action Project: overall design and findings from baseline
screening.,” Lancet (London, England), vol. 354, no. 9173, pp. 99–105, Jul. 1999.
[16] S. S. Parveen and C. Kavitha, “A Review on Computer Aided Detection and Diagnosis
of lung cancer nodules,” Int. J. Comput. Technol., vol. 3, no. 3, pp. 393–400, 2012.
[17] M. Mabrouk, A. Karrar, and A. Sharawy, “Computer Aided Detection of Large Lung
Nodules using Chest Computer Tomography Images,” Computer (Long. Beach. Calif).,
vol. 3, no. 9, pp. 12–18, 2012.
[18] W. J. Choi and T. S. Choi, “Automated pulmonary nodule detection system in computed
tomography images: A hierarchical block classification approach,” Entropy, vol. 15, no.
2, pp. 507–523, 2013
[19] D. J. Brenner and E. J. Hall, “Computed Tomography — An Increasing Source of
Radiation Exposure,” N. Engl. J. Med., vol. 357, no. 22, pp. 2277–2284, Nov. 2007.
[20] S. Diederich, M. Lentschig, T. Overbeck, D. Wormanns, and W. Heindel, “Detection of
pulmonary nodules at spiral CT: comparison of maximum intensity projection sliding
slabs and single-image reporting,” Eur. Radiol., vol. 11, no. 8, pp. 1345–1350, Aug.
2001.
[21] K. Doi, “Computer-aided diagnosis in medical imaging: historical review, current status
and future potential.,” Comput. Med. Imaging Graph., vol. 31, no. 4–5, pp. 198–211,
2007.
[22] S. Zhou, Y. Cheng, and S. Tamura, “Automated lung segmentation and smoothing
techniques for inclusion of juxtapleural nodules and pulmonary vessels on chest CT
111
images,” Biomed. Signal Process. Control, vol. 13, pp. 62–70, Sep. 2014.
[23] “R2 ImageChecker CAD - View all - Medical Imaging - Christie InnoMed.” [Online].
Available: http://www.christieinnomed.com/en/r2-imagechecker-cad. [Accessed: 27-
Jul-2017].
[24] “Syngo LungCARE CT and syngo Lung CAD1.” [Online]. Available:
https://www.healthcare.siemens.com/computed-tomography/options-upgrades/clinical-
applications/syngo-lungcare-ct-and-syngo-lung-cad. [Accessed: 27-Jul-2017].
[25] “Veolity - a brand of MeVis Medical Solutions AG: Home.” [Online]. Available:
http://www.veolity.com/. [Accessed: 27-Jul-2017].
[26] S. Schalekamp, B. van Ginneken, B. Heggelman, M. Imhof-Tas, I. Somers, M. Brink,
M. Spee, C. Schaefer-Prokop, and N. Karssemeijer, “New methods for using computer-
aided detection information for the detection of lung nodules on chest radiographs,” Br.
J. Radiol., vol. 87, no. 1036, p. 20140015, Apr. 2014.
[27] S. G. Armato, G. McLennan, D. Hawkins, L. Bidaut, M. F. McNitt-Gray, C. R. Meyer,
A. P. Reeves, B. Zhao, D. R. Aberle, C. I. Henschke, E. A. Hoffman, E. A. Kazerooni,
H. MacMahon, E. J. R. Van Beeke, D. Yankelevitz, A. M. Biancardi, P. H. Bland, M.
S. Brown, R. M. Engelmann, G. E. Laderach, D. Max, R. C. Pais, D. P. Y. Qing, R. Y.
Roberts, A. R. Smith, A. Starkey, P. Batrah, P. Caligiuri, A. Farooqi, G. W. Gladish, C.
M. Jude, R. F. Munden, I. Petkovska, L. E. Quint, L. H. Schwartz, B. Sundaram, L. E.
Dodd, C. Fenimore, D. Gur, N. Petrick, J. Freymann, J. Kirby, B. Hughes, A. Vande
Casteele, S. Gupte, M. Sallamm, M. D. Heath, M. H. Kuhn, E. Dharaiya, R. Burns, D.
S. Fryd, M. Salganicoff, V. Anand, U. Shreter, S. Vastagh, B. Y. Croft, and L. P. Clarke,
“The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative
112
(IDRI): a completed reference database of lung nodules on CT scans.,” Med. Phys., vol.
38, no. 2, pp. 915–931, 2011.
[28] C. I. Henschke, D. I. McCauley, D. F. Yankelevitz, D. P. Naidich, G. McGuinness, O.
S. Miettinen, D. Libby, M. Pasmantier, J. Koizumi, N. Altorki, and J. P. Smith, “Early
lung cancer action project: a summary of the findings on baseline screening.,”
Oncologist, vol. 6, no. 2, pp. 147–52, 2001.
[29] Public Lung Image database to address drug response. Vision and Image Analysis Group
(VIA) and International Early Lung Cancer Action Program (I-ELCAP) Labs, Cornell
University. http://www.via.cornell.edu/crpf.html; 2008 [accessed 24-04-16].
[30] W.J. Choi and T.S. Choi, “Genetic programming-based feature transform and
classification for the automatic detection of pulmonary nodules on computed
tomography images,” Inf. Sci. (Ny)., vol. 212, pp. 57–78, 2012.
[31] J. Dehmeshki, X. Ye, X. Lin, M. Valdivieso, and H. Amin, “Automated detection of
lung nodules in CT images using shape-based genetic algorithm,” Comput. Med.
Imaging Graph., vol. 31, no. 6, pp. 408–417, 2007.
[32] J.J. Suárez-Cuenca, P. G. Tahoces, M. Souto, et al., “Application of the iris filter for
automatic detection of pulmonary nodules on computed tomography images,” Comput.
Biol. Med., vol. 39, no. 10, pp. 921–933, 2009.
[33] X. Ye, X. Lin, J. Dehmeshki, et al., “Shape based computer-aided detection of lung
nodules in thoracic CT images,” IEEE Trans. Biomed. Eng. vol. 56, no. 7, pp. 1810–
1820, 2009.
[34] I. Sluimer, M. Prokop, and B. van Ginneken, “Toward automated segmentation of the
pathological lung in CT,” IEEE Trans. Med. Imaging., vol. 24, no. 8, pp. 1025–1038,
113
2005.
[35] G. De Nunzio, E. Tommasi, A. Agrusti, et al., “Automatic lung segmentation in CT
images with accurate handling of the hilar region,” J. Digit. Imaging, vol. 24, no. 1, pp.
11–27, 2011.
[36] A. M. Ali and A. A. Farag, “Automatic Lung Segmentation of Volumetric Low-Dose
CT Scans Using Graph Cuts,” in Advances in Visual Computing: 4th International
Symposium, ISVC 2008, Las Vegas, NV, USA, December 1-3, 2008. Proceedings, Part
I, G. Bebis, R. Boyle, B. Parvin, D. Koracin, P. Remagnino, F. Porikli, J. Peters, J.
Klosowski, L. Arns, Y. K. Chun, T.-M. Rhyne, and L. Monroe, Eds. Berlin, Heidelberg:
Springer Berlin Heidelberg, 2008, pp. 258–267.
[37] E. van Rikxoort, B. de Hoop, and M. Viergever, “Automatic lung segmentation from
thoracic computed tomography scans using a hybrid approach with error detection,”
Med. Phys., vol. 36, no. 7, pp. 2934, 2009.
[38] D.S. Paik, C. F. Beaulieu, G. D. Rubin, et al., “Surface normal overlap: a computer-
aided detection algorithm with application to colonic polyps and lung nodules in helical
CT,” Med. Imaging, IEEE Trans., vol. 23, no. 6, pp. 661–675, 2004.
[39] A. Besbes and N. Paragios, “Landmark-based segmentation of lungs while handling
partial correspondences using sparse graph-based priors,” in Proceedings of the
International Symposium on Biomedical Imaging (ISBI ’11), 2011, pp. 989–995.
[40] M. Sofka, J. Wetzl, N. Birkbeck et al., “Multi-stage learning for robust lung segmentation
in challenging CT volumes,” in Proceedings of the International Conference on
114
Medical 27 Imaging Computing and Computer-Assisted Intervention (MICCAI ’11),
2011, pp. 667–674.
[41] S. Sun, C. Bauer, and R. Beichel, “Automated 3-D segmentation of lungs with lung
cancer in CT data using a novel robust active shape model approach,” IEEE
Transactions on Medical Imaging, vol. 31, no. 2, pp. 449–460, 2012.
[42] A. Mansoor, U. Bagci, Z. Xu, B. Foster, K. N. Olivier, J. M. Elinoff, A. F. Suffredini, J.
K. Udupa, and D. J. Mollura, “A Generic Approach to Pathological Lung
Segmentation,” IEEE Trans. Med. Imaging, vol. 33, no. 12, pp. 2293–2310, Dec. 2014.
[43] S. Dai, K. Lu, J. Dong, Y. Zhang, and Y. Chen, “A novel approach of lung segmentation
on chest CT images using graph cuts,” Neurocomputing, vol. 168, pp. 799–807, Nov.
2015.
[44] A. Soliman, F. Khalifa, A. Elnakib, M. Abou El-Ghar, N. Dunlap, B. Wang, G.
Gimel’farb, R. Keynton, and A. El-Baz, “Accurate Lungs Segmentation on CT Chest
Images by Adaptive Appearance-Guided Shape Modeling,” IEEE Trans. Med. Imaging,
vol. 36, no. 1, pp. 263–276, Jan. 2017.
[45] A. M. Mendonca, J. A. da Silva, and A. Campilho, “Automatic delimitation of lung fields
on chest radiographs,” in proceedings of the International Symposium on Biomedical
Imaging (ISBI ’04), vol. 2, 2004, pp. 1287–1290.
[46] Y. Yim, H. Hong, and Y. G. Shin, “Hybrid lung segmentation in chest CT images for
computer-aided diagnosis,” in 7th International Workshop on Enterprise Networking
and Computing in Healthcare Industry, HEALTHCOM2005, June 2005, pp. 378–383.
115
[47] P. Campadelli, E. Casiraghi, and D. Artioli, “A fully automated method for lung nodule
detection from postero-anterior chest radiographs,” IEEE Transactions on Medical
Imaging, vol. 25, no. 12, pp. 1588–1603, 2006.
[48] P. Korfiatis, S. Skiadopoulos, P. Sakellaropoulos, C. Kalogeropoulou,and L. Costaridou,
“Combining 2D wavelet edge highlighting and 3D thresholding for lung segmentation
in thin-slice CT,” British Journal of Radiology, vol. 80, no. 960, pp. 996–1005, 2007.
[49] S. Hu, E. A. Hoffman, and J. M. Reinhardt, “Automatic lung segmentation for accurate
quantitation of volumetric X-ray CT images,” IEEE Transactions on Medical Imaging,
vol. 20, no. 6, pp. 490–498, 2001.
[50] Q. Gao, S. Wang, D. Zhao, and J. Liu, “Accurate lung segmentation for X-ray CT
images,” in Proceedings of the 3rd International Conference on Natural Computation
(ICNC ’07), vol. 2, 2007, pp. 275– 279.
[51] Z. Shi, J. Ma, M. Zhao, Y. Liu, Y. Feng, M. Zhang, L. He, and K. Suzuki, “Many Is
Better Than One: An Integration of Multiple Simple Strategies for Accurate Lung
Segmentation in CT Images,” Biomed Res. Int., vol. 2016, pp. 1–13, Aug. 2016.
[52] Y. Shi, F. Qi, Z. Xue et al., “Segmenting lung fields in serial chest radiographs using
both population-based and patient-specific shape statistics,” IEEE Transactions on
Medical Imaging, vol. 27, no. 4, pp. 481–494, 2008.
116
[53] A. El-Baz, G. Gimel’farb, R. Falk, M. Abou El-Ghar, T. Holland, and T. Shafer, “A
new stochastic framework for accurate lung segmentation,” in Proceedings of the
International Conference on Medical Imaging Computing and Computer-Assisted
Intervention (MICCAI ’08), New York, NY, USA, September 2008, pp. 322–330.
[54] P. Annangi, S. Thiruvenkadam, A. Raja, H. Xu, X. Sun, and L. Mao, “Region based
active contour method for x-ray lung segmentation using prior shape and low level
features,” in Proceedings of the 7th IEEE International Symposium on Biomedical
Imaging: from Nano to Macro (ISBI ’10), April 2010, pp. 892–895.
[55] P. P. Rebouças Filho, P. C. Cortez, A. C. da Silva Barros, V. H. C. Albuquerque, and J.
M. R. S. Tavares, “Novel and powerful 3D adaptive crisp active contour method applied
in the segmentation of CT lung images,” Med. Image Anal., vol. 35, pp. 503–516, Jan.
2017.
[56] A. El-Baz, G. M. Beache, G. Gimel’Farb, K. Suzuki, K. Okada, A. Elnakib, A. Soliman,
and B. Abdollahi, “Computer-aided diagnosis systems for lung cancer: Challenges and
methodologies,” Int. J. Biomed. Imaging, vol. 2013, 2013.
[57] S. Akram, M. Y. Javed, M. U. Akram, U. Qamar, and A. Hassan, “Pulmonary Nodules
Detection and Classification Using Hybrid Features from Computerized Tomographic
Images,” J. Med. Imaging Heal. Informatics, vol. 6, no. 1, pp. 252–259, Feb. 2016.
[58] J. P. Ko and M. Betke, “Chest CT: automated nodule detection and assessment of change
over time—preliminary experience,” Radiology, vol. 218, no. 1, pp. 267–273, 2001.
117
[59] B. Zhao, M. S. Ginsberg, R. A. Lefkowitz, L. Jiang, C. Cooper, and L.H. Schwartz,
“Application of the LDM algorithm to identify small lung nodules on low dose MSCT
scans,” in Proceedings of the Progress in Biomedical Optics and Imaging—Medical
Imaging 2004: Imaging Processing, February 2004, pp. 818–823.
[60] L. Gonçalves, J. Novo, and A. Campilho, “Hessian based approaches for 3D lung nodule
segmentation,” Expert Syst. Appl., vol. 61, pp. 1–15, Nov. 2016.
[61] B. Chen, T. Kitasaka, H. Honma, H. Takabatake, M. Mori, H. Natori, and K. Mori,
“Automatic segmentation of pulmonary blood vessels and nodules based on local
intensity structure analysis and surface propagation in 3D chest CT images,” Int. J.
Comput. Assist. Radiol. Surg., vol. 7, no. 3, pp. 465–482, 2012.
[62] Q. Li, K. Doi, New selective nodule enhancement filter and its application for significant
improvement of nodule detection on computed tomography, in: Medical Imaging2004:
Image Processing, February 14, San Diego, CA, USA,2004, pp. 1–9.
[63] Hasanabadi, Hosien, Mohsen Zabihi, and Qazaleh Mirsharif. "Detection of pulmonary
nodules in CT images using template matching and neural classifier." Journal of
Advances in Computer Research, vol. 5, no. 1 pp. 19-28, 2014.
[64] R. Wiemker, P. Rogalla, A. Zwartkruis, and T. Blaffert, “Computer aided lung nodule
detection on high resolution CT data,” in Medical Imaging: Image Processing, vol.
4684 of Proceedings of SPIE, February 2002, pp. 677–688.
[65] Y. Lee, T. Hara, H. Fujita, S. Itoh, and T. Ishigaki, “Automated detection of pulmonary
nodules in helical CT images based on an improved template-matching technique,”
IEEE Transactions on Medical Imaging, vol. 20, no. 7, pp. 595–604, 2001.
118
[66] A. El-Baz, A. Elnakib, M. Abou El-Ghar, G. Gimel’Farb, R. Falk, and A. Farag,
“Automatic detection of 2D and 3D lung nodules in chest spiral CT scans,” Int. J.
Biomed. Imaging, vol. 2013, 2013.
[67] D. Cascio, R. Magro, F. Fauci, M. Iacomi, G. Raso, Automatic detection of lung nodules
in CT datasets based on stable 3D mass-spring models, Comput. Biol. Med., vol. 42 no.
11, pp. 1098–1109, 2012.
[68] S. Soltaninejad, M. Keshani, F. Tajeripour, Lung nodule detection by KNN classifier
and active contour modelling and 3D visualization, in: The 16th CSI International
Symposium on Artificial Intelligence and Signal Processing (AISP 2012), IEEE, May
2–3, Shiraz, Fars, Iran, 2012, pp.440–445.
[69] J. Jiantao Pu, D. S. Paik, X. Xin Meng, J. Roos, and G. D. Rubin, “Shape ‘Break-and-
Repair’ Strategy and Its Application to Automated Medical Image Segmentation,”
IEEE Trans. Vis. Comput. Graph., vol. 17, no. 1, pp. 115–124, Jan. 2011.
[70] T. Kubota, A. K. Jerebko, M. Dewan, M. Salganicoff, and A. Krishnan, “Segmentation
of pulmonary nodules of various densities with morphological approaches and
convexity models,” Med. Image Anal., vol. 15, no. 1, pp. 133–154, 2011.
[71] A. Riccardi, T.S. Petkov, G. Ferri, M. Masotti, R. Campanini, Computer-aided detection
of lung nodules via 3D fast radial transform, scale space representation, and Zernike
MIP classification, Med. Phys., vol. 38, no. 4, pp. 1962–1971, 2011.
[72] S. Taghavi Namin, H. Abrishami Moghaddam, R. Jafari, M. Esmaeil-Zadeh, M. Gity,
Automated detection and classification of pulmonary nodules in 3D thoracic CT
images, in: IEEE International Conference on Systems, Man and Cybernetics, IEEE,
October 10–13, Istanbul, Turkey, 2010, pp. 3774–3779.
119
[73] K. Murphy, A. Schilham, H. Gietema, M. Prokop, B. van Ginneken, Automated detection
of pulmonary nodules from low-dose computed tomography scans using atwo-stage
classification system based on local image features, in: Medical Imaging 2007:
Computer-Aided Diagnosis, International Society for Optics and Photonics, February
17, San Diego, CA, USA, 2007, pp.651410-1–651410-12.
[74] S. Ozekes, O. Osman, and O. N. Ucan, “Nodule detection in a lung region that’s
segmented with using genetic cellular neural networks and 3D template matching with
fuzzy rule based thresholding,” Korean J. Radiol., vol. 9, no. 1, pp. 1–9, 2008.
[75] Z. Ge, B. Sahiner, H.-P. Chan, L.M. Hadjiiski, P.N. Cascade, N. Bogot, E.A. Kazerooni,
J. Wei, C. Zhou, Computer-aided detection of lung nodules: False positive reduction
using a 3D gradient field method and 3D ellipsoid fitting, Med. Phys. vol. 32, no. 8, pp.
2443–2454, 2005.
[76] P. R. S. Mendonca, R. Bhotika, S. A. Sirohey, W. D. Turner, J. V. Miller, and R. S.
Avila, “Model-based analysis of local shape for lesion detection in CT scans,” in
Proceedings of the International Conference on Medical Imaging Computing and
Computer- Assisted Intervention (MICCAI ’05), vol. 8, 2005, pp. 688–695.
[77] S. Chang, H. Emoto, D. N. Metaxas, and L. Axe, “Pulmonary micronodule detection
from 3D chest CT,” in Proceedings of the International Conference Medical Imaging
Computing and Computer-Assisted Intervention (MICCAI ’04), vol. 3217, 2004, pp.
821– 828.
[78] H. Takizawa, K. Shigemoto, S. Yamamoto et al., “A recognition method of lung nodule
shadows in X-Ray CT images using 3D object models,” International Journal of Image
and Graphics, vol. 3, no. 4, pp. 533–545, 2003.
120
[79] G. Agam, S. Armato, Vessel tree reconstruction in thoracic CT scans with application to
nodule detection, IEEE Trans. Med. Imaging, vol. 24, no. 4, pp. 486–499, 2005.
[80] K. Awai, K. Murao, A. Ozawa et al., “Pulmonary nodules at chest CT: effect of computer-
aided diagnosis on radiologists’ detection performance,” Radiology, vol. 230, no. 2, pp.
347–352, 2004.
[81] C. I. Fetita, F. Prteux, C. Beigelman-Aubry, and P. Grenier, “3D automated lung nodule
segmentation in HRCT,” in Proceedings of the International Conference Medical
Imaging Computing and Computer-Assisted Intervention (MICCAI ’03), vol. 2878,
2003, pp. 626–634.
[82] M. Tanino, H. Takizawa, S. Yamamoto, T. Matsumoto, Y. Tateno, and T. Iinuma, “A
detection method of ground glass opacities in chest X-ray CT images using automatic
clustering techniques,” in Medical Imaging: Image Processing, vol. 5032 of
Proceedings of SPIE, February 2003, pp. 1728–1737.
[83] T. Ezoe, H. Takizawa, S. Yamamoto et al., “An automatic detection method of lung
cancers including ground glass opacities from chest X-ray CT images,” in Medical
Imaging: Image Processing, vol. 4684 of Proceedings of SPIE, February 2002, pp.
1672–1680.
[84] S. Saita, T. Oda, M. Kubo et al., “Nodule detection algorithm based on multi-slice CT
images for lung cancer screening,” in Medical Imaging: Imaging Processing,
Proceedings of SPIE, February 2004, pp. 1083–1090.
[85] T. Oda, M. Kubo, Y. Kawata et al., “A detection algorithm of lung cancer candidate
nodules on multi-slice CT images,” in Medical Imaging: Image Processing, vol. 5370
of Proceedings of SPIE, February 2002, pp. 1354–1361.
121
[86] N. Yamada, M. Kubo, Y. Kawata et al., “ROI extraction of chest CT images using
adaptive opening filter,” in Medical Imaging: Image Processing, vol. 5032 of
Proceedings of SPIE, February 2003, pp. 869–876.
[87] M. N. Gurcan, B. Sahiner, N. Petrick et al., “Lung nodule detection on thoracic computed
tomography images: preliminary evaluation of a computer-aided diagnosis system,”
Medical Physics, vol. 29, no. 11, pp. 2552–2558, 2002.
[88] M. Kubo, K. Kubota, N. Yamada et al., “A CAD system for lung cancer based on low
dose single-slice CT image,” in Medical Imaging: Image Processing, vol. 4684 of
Proceedings of SPIE, February 2002, pp. 1262–1269.
[89] Y. Mekada, T. Kusanagi, Y. Hayase, K. Mori, J.-I. Hasegawa,J.-I. Toriwaki, M. Mori,
H. Natori, Detection of small nodules from 3D chest X-ray CT images based on shape
features, Int. Congr. Ser., vol. 1256, pp. 971–976, 2003.
[90] M. S. Brown, M. F. McNitt-Gray, J. G. Goldin, R. D. Suh, J. W. Sayre, and D. R. Aberle,
“Patient-specific models for lung nodule detection and surveillance in CT images,”
IEEE Transactions on Medical Imaging, vol. 20, no. 12, pp. 1242–1250, 2001.
[91] Y. Kawata, N. Niki, H. Ohmatsu et al., “Computer aided diagnosis of pulmonary nodules
using three-dimensional thoracic CT images,” in Proceedings of the International
Conference Medical Imaging Computing and Computer-Assisted Intervention
(MICCAI’01), vol. 2208, 2001, pp. 1393–1394.
[92] T. Messay, R. C. Hardie, S. K. Rogers, A. Ekin, V. Romano, T. Bülow, N. Bogot, C.
Zhou, A. Chughtai, C. Poopat, and et al., “A new computationally efficient CAD system
for pulmonary nodule detection in CT imagery.,” Med. Image Anal., vol. 14, no. 3, pp.
390–406, Jun. 2010.
122
[93] S. L. A. Lee, A. Z. Kouzani, and E. J. Hu, “Random forest based lung nodule
classification aided by clustering,” Comput. Med. Imaging Graph., vol. 34, no. 7, pp.
535–542, 2010.
[94] M. Niemeijer, M. Loog, M. D. Abramoff, M. A. Viergever, M. Prokop, and B. van
Ginneken, “On Combining Computer-Aided Detection Systems,” IEEE Trans. Med.
Imaging, vol. 30, no. 2, pp. 215–223, Feb. 2011.
[95] P. G. Espejo, S. Ventura, and F. Herrera, “A Survey on the Application of Genetic
Programming to Classification,” Ieee Trans. Syst. Man, Cybern. Part C Appl. Rev., vol.
40, no. 2, pp. 121–144, 2010.
[96] S. C. B. Lo, H. Li, Y. Wang, L. Kinnard, and M. T. Freedman, “A multiple circular path
convolution neural network system for detection of mammographic masses,” IEEE
Transactions on Medical Imaging, vol. 21, no. 2, pp. 150–158, 2002.
[97] K. Suzuki, “A supervised ’lesion-enhancement’ filter by use of a massive-training
artificial neural network (MTANN) in computer- aided diagnosis (CAD),” Physics in
Medicine and Biology, vol. 54, no. 18, pp. S31–S45, 2009.
[98] K. Sirinukunwattana, S. E. A. Raza, Y.-W. Tsang, D. R. J. Snead, I. A. Cree, and N. M.
Rajpoot, “Locality Sensitive Deep Learning for Detection and Classification of Nuclei
in Routine Colon Cancer Histology Images,” IEEE Trans. Med. Imaging, vol. 35, no.
5, pp. 1196–1206, May 2016.
[99] K. Murphy, B. van Ginneken, A. M. R. Schilham, B. J. de Hoop, H. A. Gietema, and M.
Prokop, “A large-scale evaluation of automatic pulmonary nodule detection in chest
CT using local image features and k-nearest-neighbour classification,” Med. Image
Anal., vol. 13, no. 5, pp. 757–770, 2009.
123
[100] W. Guo, Y. Wei, H. Zhou, D. Xue, W. Wei Guo, Y. Ying Wei, H. Hanxun Zhou, and
D. DingYe Xue, An adaptive lung nodule detection algorithm. Chinese Control
Decision Conference IEEE, 2009, pp. 2361–2365.
[101] Y. Liu, J. Yang, D. Zhao, and J. Liu, Computer aided detection of lung nodules based
on voxel analysis utilizing support vector machines. FBIE 2009 - 2009 Int. Conf. Futur.
Biomed. Inf. Eng., 2009, pp. 90–93.
[102] A. Retico, M. E. E. Fantacci, I. Gori, P. Kasae, B. Golosio, A. Piccioli, P. Cerello, G.
De Nunzio, and S. Tangaro, Pleural nodule identification in low-dose and thin-slice
lung computed tomography. Comput. Biol. Med., vol. 39, no. 12, pp. 1137–1144, 2009.
[103] S. Ozekes and O. Osman, Computerized lung nodule detection using 3D Feature
extraction and learning based algorithms. J. Med. Syst., vol. 34, no. 2, pp. 185–194
2010.
[104] J. R. F. D. S. Sousa, A. C. Silva, A. C. de Paiva, and R. A. Nunes, Methodology for
automatic detection of lung nodules in computerized tomography images. Comput.
Methods Programs Biomed., vol. 98, no.1, pp. 1–14, 2010.
[105] M. Assefa, I. Faye, A. S. Malik, and M. Shoaib, Lung nodule detection using multi-
resolution analysis. 2013 ICME Int. Conf. Complex Med. Eng., 2013, pp. 457–461.
[106] A. Tariq, M. U. Akram, and M. Y. Javed, “Lung nodule detection in CT images using
neuro fuzzy classifier,” 2013 Fourth Int. Work. Comput. Intell. Med. Imaging, 2013,
pp. 49–53.
[107] H. M. Orozco, O. O. V. Villegas, H. D. J. O. Dominguez, and V. G. C. Sanchez, Lung
Nodule Classification in CT Thorax Images Using Support Vector Machines. 2013 12th
Mex. Int. Conf. Artif. Intell., 2013, pp. 277–283.
124
[108] A. Tartar, N. Kilic, and A. Akan, Classification of pulmonary nodules by using hybrid
features. Comput. Math. Methods Med., vol. 2013, pp.1–11, 2013.
[109] A. Teramoto, H. Fujita, K. Takahashi, O. Yamamuro, T. Tamaki, M. Nishio, and T.
Kobayashi, Hybrid method for the detection of pulmonary nodules using positron
emission tomography/computed tomography: A preliminary study. Int. J. Comput.
Assist. Radiol. Surg., vol. 9, no. 1, pp. 59–69, 2014.
[110] B. van Ginneken, A. A. A. Setio, C. Jacobs, and F. Ciompi, “Off-the-shelf convolutional
neural network features for pulmonary nodule detection in computed tomography
scans,” in 2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI),
2015, pp. 286–289.
[111] A. A. A. Setio, F. Ciompi, G. Litjens, P. Gerke, C. Jacobs, S. J. van Riel, M. M. W.
Wille, M. Naqibullah, C. I. Sanchez, and B. van Ginneken, “Pulmonary Nodule
Detection in CT Images: False Positive Reduction Using Multi-View Convolutional
Networks,” IEEE Trans. Med. Imaging, vol. 35, no. 5, pp. 1160–1169, May 2016.
[112] R. Anirudh, J. J. Thiagarajan, T. Bremer, and H. Kim, “Lung nodule detection using
3D convolutional neural networks trained on weakly labeled data,” Medical Imaging
2016: Computer-Aided Diagnosis 2016, vol. 9785, p. 978532.
[113] C. Jacobs, E. M. van Rikxoort, K. Murphy, M. Prokop, C. M. Schaefer-Prokop, and B.
van Ginneken, “Computer-aided detection of pulmonary nodules: a comparative study
using the public LIDC/IDRI database,” Eur. Radiol., vol. 26, no. 7, pp. 2139–2147, Jul.
2016.
125
[114] J. Ding, A. Li, Z. Hu, and L. Wang, “Accurate Pulmonary Nodule Detection in
Computed Tomography Images Using Deep Convolutional Neural Networks,” Jun.
2017, arXiv preprintarXiv:1706.04303.
[115] A. A. A. Setio, A. Traverso, T. de Bel, M. S. N. Berens, C. van den Bogaard, P. Cerello,
H. Chen, Q. Dou, M. E. Fantacci, B. Geurts, R. van der Gugten, P. A. Heng, B. Jansen,
M. M. J. de Kaste, V. Kotov, J. Y.-H. Lin, J. T. M. C. Manders, A. Sóñora-Mengana, J.
C. García-Naranjo, E. Papavasileiou, M. Prokop, M. Saletta, C. M. Schaefer-Prokop, E.
T. Scholten, L. Scholten, M. M. Snoeren, E. L. Torres, J. Vandemeulebroucke, N.
Walasek, G. C. A. Zuidhof, B. van Ginneken, and C. Jacobs, “Validation, comparison,
and combination of algorithms for automatic detection of pulmonary nodules in
computed tomography images: The LUNA16 challenge.,” Med. Image Anal., vol. 42,
pp. 1–13, Dec. 2017.
[116] W. Zhu, C. Liu, W. Fan, and X. Xie, “DeepLung: 3D Deep Convolutional Nets for
Automated Pulmonary Nodule Detection and Classification,” Sep. 2017, arXiv preprint
arXiv:1709.05538.
[117] T. Huang, G. Yang, and G. Tang, “A fast two-dimensional median filtering algorithm,”
IEEE Trans. Acoust., vol. 27, no. 1, pp. 13–18, Feb. 1979.
[118] S. G. Armato, M. L. Giger, C. J. Moran, J. T. Blackburn, K. Doi, and H. MacMahon,
“Computerized detection of pulmonary nodules on CT scans,” Radiographics, vol. 19,
no. 5, pp. 1303– 1311, 1999.
[119] Z. Shi, M. Zhao, Y. Wang, L. He, K. Suzuki, C. Jin, and M. Zhang, “Hessian-log: A
novel dot enhancement filter,” ICIC Express Lett. Part B Appl., vol. 6, no. 8, pp. 1987–
1992, 2012.
126
[120] R. C. Gonzalez and R. E. (Richard E. Woods, Digital image processing. Prentice Hall,
2008, ISBN: 9780131687288.
[121] Q. Li, S. Sone, and K. Doi, “Selective enhancement filters for nodules, vessels, and
airway walls in two- and three-dimensional CT scans.,” Med. Phys., vol. 30, no. 8, pp.
2040–2051, 2003.
[122] L. Shapiro and G. Stockman, “Computer Vision,” Prentice Hall, 2001, p. 580, ISBN:
9780130307965.
[123] Y. Yu and H. Zhao, “Enhancement Filter for Computer-Aided Detection of Pulmonary
Nodules on Thoracic CT images,” Sixth Int. Conf. Intell. Syst. Des. Appl., vol. 2, 2006,
pp. 1200–1205.
[124] S. L. A. Lee, A. Z. Kouzani, and E. J. Hu, “Automated detection of lung nodules in
computed tomography images: A review,” Mach. Vis. Appl., vol. 23, no. 1, pp. 151–
163, 2012.
[125] R. M. Haralick, K. Shanmugam, and I. Dinstein, “Textural Features for Image
Classification,” IEEE Trans. Syst. Man. Cybern., vol. 3, no. 6, pp. 610–621, Nov. 1973.
[126] G. M. Xian, “An identification method of malignant and benign liver tumors from
ultrasonography based on GLCM texture features and fuzzy SVM,” Expert Syst. Appl.,
vol. 37, no. 10, pp. 6737–6741, 2010.
[127] J. L. Rodgers and W. A. Nicewander, “Thirteen Ways to Look at the Correlation
Coefficient,” Am. Stat., vol. 42, no. 1, p. 59, Feb. 1988.
127
[128] B. E. Boser, I. M. Guyon, and V. N. Vapnik, “A training algorithm for optimal margin
classifiers,” in Proceedings of the fifth annual workshop on Computational learning
theory - COLT ’92, 1992, pp. 144–152.
[129] T. Sun, J. Wang, X. Li, P. Lv, F. Liu, Y. Luo, Q. Gao, H. Zhu, and X. Guo,
“Comparative evaluation of support vector machines for computer aided diagnosis of
lung cancer in CT based on a multi-dimensional data set,” Comput. Methods Programs
Biomed., vol. 111, no. 2, pp. 519–524, 2013.
[130] N. S. Altman, “An Introduction to Kernel and Nearest-Neighbor Nonparametric
Regression,” Am. Stat., vol. 46, no. 3, pp. 175–185, Aug. 1992.
[131] J. R. Quinlan, “Simplifying decision trees,” Int. J. Man. Mach. Stud., vol. 27, no. 3, pp.
221–234, Sep. 1987.
[132] K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press, San
Diego, Calif, USA, 2nd edition, 1990, ISBN: 9780122698514.
[133] Chih-Wei Hsu, Chih-Chung Chang, and C.-J. L. “A Practical Guide to Support Vector
Classification,” BJU Int., vol. 101, no. 1, pp. 1396–400, Feb. 2008.
[134] O. Chapelle and A. Zien, “Semi-Supervised Classification by Low Density Separation,”
Biol. Cybern., vol. 2005, pp. 57–64, 2005.
[135] J. A. Swets, “Measuring the accuracy of diagnostic systems.,” Science, vol. 240, no.
4857, pp. 1285–93, Jun. 1988.
[136] Q. Dou, H. Chen, L. Yu, J. Qin, and P.-A. Heng, “Multilevel Contextual 3-D CNNs for
False Positive Reduction in Pulmonary Nodule Detection,” IEEE Trans. Biomed. Eng.,
vol. 64, no. 7, pp. 1558–1567, Jul. 2017.
128
[137] M. Bergtholdt, R. Wiemker, and T. Klinder, “Pulmonary nodule detection using a
cascaded SVM classifier,” Medical imaging, vol. 9785, p. 978513, 2016.
[138] E. L. Torres, E. Fiorina, F. Pennazio, C. Peroni, M. Saletta, N. Camarlinghi, M. E.
Fantacci, and P. Cerello, “Large scale validation of the M5L lung CAD on
heterogeneous CT datasets,” Med. Phys., vol. 42, pp. 1477–1489, 2015.
[139] D. P. Chakraborty, “Maximum likelihood analysis of free-response receiver operating
characteristic (FROC) data,” Med. Phys., vol. 16, no. 4, pp. 561–568, Jul. 1989.
129
APPENDIX A
DICOM ENCODING STRUCTURE
DICOM image has same main body structure like other image formats, which consists of the
file header part (i.e., the overall structure of the file description) and the data sets (i.e., pixel
data). The first part is the file header. The header section also contains two parts, the first is a
size of 128 00H bytes reservation information and the second part is the file type identification
field and the content is "DICM", which indicates that the file is a DICOM image format.
The second part is the dataset section. The dataset section encapsulates several data set
information, each of which is represented by a Group Number (GN), which occupies 2 bytes.
Each dataset itself, is complex in structure and content, and is categorized according to
functional types. The dataset can be divided into Default Dataset, Standard Dataset, and Private
Dataset. Some groups in the Default Dataset must not be omitted for DICOM files, which are
sections 0001 through 0007 and FFFF groups. Such as, 0002 groups in which information such
as the transfer syntax for parsing datasets is used to instruct the program's staff to read the data.
A Standard Dataset is a set of all even-numbered sets after the default dataset is removed, such
as a 0008 set of data that record basic image information, including image types and image
identification numbers and 0010 sets of data that record patient information, such as patient
name and patient age. A Private Dataset is a set of all odd-numbered sets of data except the
default dataset and the FFFF group. Private dataset information is a necessity and such datasets
provide private customization for major hardware device manufacturers.
Because of the complexity of the information types and contents of patient or manufacturer’s
equipment, the datasets are decomposed into several data elements, each of which is
distinguished by Element Number (EN). Briefly, the data element consists of four main parts,
130
namely the tag (Tag), Value representation type VR (Value Represent), the value length VL
(Value Length) and the range VF (Value Field). The value of each section is shown in Tables
A-1 and A-2.
Table A-1: Data elements of explicit VR for type OB, OW, OF, SQ, UT, or UN
Table A-2: Data elements for other types of explicit VRs
(1) Element Tag. The element tag is an unsigned integer data that occupies 4 bytes, the first 2
bytes represent the element's group number and the last 2 bytes represent the element's element
number. The tag can be used as the index number of the data element to quickly query and
pinpoint in the procedure code.
(2) Value Represent (VR). The value representation type is of 2 bytes and indicates the type.
If the transfer syntax is implicitly transmitted, it can be omitted. If the syntax is explicitly
transferred, the contents of the field are the corresponding data element type (the reference data
Element ‘Tag’ Value Representation Type VR
Value
Length
VL
Value Field
VF
Group
number
GN
Element
No. EN
VR
(OB,OW,OF,SQ,UT,UN
)
0000H
(reserved
bits)
VL
Binary
conversion
information
2 Bytes 2 Bytes 2 Bytes 2 Bytes 4 Bytes In accordance
with the VL
Element ‘Tag’
Value Representation Type VR
Value Length
VL
Value field VF
Group
number
GN
Element
No. EN VR (Other) VL
Binary
conversion
information
2 Bytes 2 Bytes 2 Bytes 2 Bytes In accordance
with the VL
131
type description). There are 27 values, such as AS (Age String), CS (Code String), DA (DATE,
Date) and so on.
3) Value length. The value length represents the length of the data element, and accounts for 4
bytes if the VR is explicit, and 2 bytes if it is implicit. At the same time VL must be even bit
length and is padded with the corresponding characters if it is not even, generally using ‘0’ and
if the value indicates that the type is SQ type, the value length is FFFFFFFF special content.
(4) Range of domain. The actual value of the data element is stored in the value field. If the
date is stored, it means that the content is similar to "20000101".
As a medical image format, the pixel data is stored in (7FE0, 0010) data element. According
to the provisions of communication or storage, we can use lossy-compression, lossless
compression or uncompressed way to store. In this thesis, because there is no need of
transmission so we can direct use the uncompressed DICOM image where each pixel occupies
2 bytes where the image bit is 16, 12 or 8 bits, usually 12 bits. The data element structure is
shown in Figure A-1.
Data element Data elementData element ... Data element
Element tag Value Type Value Length Range
Data set
Data element
132
Figure A-1: Structure of data elements
DICOM CT Image Analysis
Our proposed algorithm reads the DICOM image, first obtains the basic file information of the
image and extracts the file judgment information. Then it obtains the image pixel information
and finally resolves it successfully. The whole process is consistent with the reading of other
image information. It is mainly the acquisition and judgment of the image structure information
to finally read the image information successfully. The parsing process is shown in Figure A-
2.
1. Read DICOM image file by loading DICOM image file into computer memory. Skip 128
bytes 00 parts and read the next 4 bytes and convert them to character through ASCII code. If
Start
Analytical Syntax
Analytical image information
Extract pixel information
Convert and display
End
DICOM image or not
Y
N
133
Figure A-2: DICOM Image Resolution Flow Chart
the character is "DICOM" then determine that the file is DICOM format image otherwise it is
not.
2. Read the first part of the base dataset, that is, the group number 0002 to 0007 of the dataset,
which is the basic step for reading the subsequent data set rules. For example, group number
0002 is a data set containing syntax rules, which determines the rules for the subsequent reading
of data length and position. The transfer syntax stored in the data element (0002,0010) is mainly
of three categories, implicit small end storage, explicit small end storage and explicit large end
storage. Table A-3 shows the relationship between the transmission of syntax values and
semantics.
Table A-3: Transfer Syntax Comparison Table
3. Read the second part of the dataset, that is, group number 0008 and onwards of the dataset.
(0008, 0005) denotes specific character set. This “Attribute Specific Character Set” can be
divided into ISO_IR and other character sets. 0010 group contains patient information and
diagnostic information, such as the patient's age, sex, date of birth, and other basic information.
0028 groups contain image information data sets, such as image size, bit width, bit allocation,
window width, window and gray image specifications and other image file information. In the
1.2.840.10008.1.2 Implicit VR Little Endian Transfer Syntax
1.2.840.10008.1.2.1 Explicit VR Little Endian Transfer Syntax
1.2.840.10008.1.2.2 Explicit VR Big Endian Transfer Syntax
134
parsing of image files, the image width and medical image information are very important to
provide the basis for the next stage.
4. Determine whether the image is positive or broken by reading the data element (7FE0, 0010)
position information which is the data element with maximum number of bytes in all the data
elements and as learnt in the previous step that the pixel allocation is of 16 bits in which 12 bits
are valid pixel information. The pixel data is extracted according to the transfer syntax rules
and the complete data part is sorted.
5. After obtaining the complete pixel data part, add the corresponding file header which can be
BMP or JPG and so can read the file. However, this research uses another file mode. Depending
on the grayscale distribution of pixels, 16-bit pixels can be converted to a single pixel of 8 bits.
We can also convert a grayscale value to a CT value which is a parameter used on medical
image. There is a linear relationship between the CT value and the gray value which can be
converted by the slope and intercept information in the DICOM data element. CT value can
distinguish the composition of the tissue such as the lung CT value varies from -1000HU to
+1000HU. The lung parenchyma CT value is generally around -600HU, the air density is -
1000HU, the water CT value is 0HU and the bone CT value is +1000HU.
The DICOM file binary encoding case is shown in Figure A-3 and the DICOM parsing
description is
performed
according to the
instance.
135
Figure A-3: DICOM File Binary Encoding
00H: 128 00H bytes
44 49 43 4D: ASCII code value representing 4 bytes of "DICM".
02 00 10 00: Indicates (0020,0010) data element tag, transfer syntax label.
55 49: Indicates the value of the type VL.
12 00: indicates the hexadecimal number 0X0012, the value is 18.
31 2E 32 2E ... 31 2E 32 00, represents a value of "1.2.840.10008.1.2" (Implicit VR
Little Endian Transfer). According to the label position and coding structure rules, the
above information can be obtained and the information can be sequentially read in
accordance with the above method and finally the image information and the pixel
information conversion can be obtained.
DICOM Images and Other Formats
Based on the standard conformance statement, the DICOM standard specifies and
complements the documentation and actual information necessary for medical imaging as well
as the provision of private data sets for different manufacturers. These datasets are meaningless
136
by themselves and can be defined and used by different device manufacturers. The main
differences are described in the following section.
Complexity of Image Format
DICOM data structure simplifies the file header information with only the basic image
identifier at the beginning of the file. Other information is sorted by group number and element
number which contains various kinds of structured and complete information including image
information, patient information, equipment information and diagnostic information and has
reserved many private dataset definitions and uses for different manufacturers. While other
images have only the basic information of the file and pixel information and their structure is
simple.
Image Parsing Efficiency
DICOM images are located using ‘Group Number’ and ‘Element Number’. There is a specified
indexing mechanism to facilitate the query and access of specific information while in other
image formats, information is positioned using the specified byte position and then the actual
position is calculated by the pointer due to which the query is slow while the information of
the image pixel data is also determined and adjusted.
Uniqueness of the Image
DICOM image is a special image format for medical image diagnosis. The document records
the image information, the patient information, the diagnosis information, the overlay
information, device information and so on. Not versatile, unlike BMP and JPEG formats, can
only be applied to medical diagnostics because it is more specific and more professional.
DICOM image and other commonly used image format structure coding principles are
generally consistent but there are many differences in the structure mechanism and application
areas however the DICOM image is quite significant in medical applications.
137
APPENDIX B
ANALYSIS OF XML INFORMATION OF PULMONARY NODULES
The diagnosis of early lung cancer’s symptoms is quite difficult. It is less likely that a non-
medical person can distinguish the pulmonary nodule symptoms. Even the experienced
radiologist experts may also face the misdiagnosis and the missed diagnosis phenomenon. So,
it is quite obvious that we need a research standard. LIDC data resources not only provide
DICOM sequence images but also provide the diagnosis information of pulmonary nodules in
a two phase annotation process by the four experts radiologists in the form of XML (EXtensible
Markup Language) file to the researchers. Pulmonary nodule XML diagnostic information
includes diagnostic information of lung cancer from four expert radiologists which consists of
case identifier, true and false nodule number, nodules’ contour coordinates information and so
on. Since the system researchers cannot judge the diagnosis of pulmonary nodules and the four
radiological experts have comprehensive and authoritative diagnostic information so the
standard reference information of pulmonary nodules is provided in the XML document which
138
provides a basis for the comparison of the results of the proposed system. The steps to analyze
XML are follows:
1. Load the pulmonary nodule xml file, get the version number date and check the instance
identification number.
2. Getting to the <readingSession> node, the node represents the doctor area, which contains
the radiology expert number, true and false nodular features and contour coordinates
information.
3. Getting to the <readingSession> sub-node radiology expert number node
<servicingRadiologist-ID>, pulmonary nodule node <unblindedReadNodule> and non-
pulmonary nodule node <nonNodule> and other nodes. Record the number of radiological
experts and record the number of pulmonary nodules and non-pulmonary nodules.
4. According to the <unblindedReadNodule> node information, queries to its child nodes
obtains the nodule number <noduleID>, the nodule information <characteristics>, and the
nodule area <roi> where the nodule information <characteristics> contains feature information
such as calcification and the node area <roi> contains image layer coordinates, true and false
nodule identification and contour coordinate information. <nonNodule> non-pulmonary
nodule information to obtain relevant information.
5. Gets the nodal contour coordinates information in the nodule region <roi>, providing data
information for subsequent decisions. The LIDC-XML structure is shown in Figure B-1.
139
Xml file<SeriesInstanceUID>Sequence instance
identification number
<StudyInstanceUID>Check the instance
identification number
<readingSession>Radiologist Diagnostic
Information ①
<readingSession>Radiologist Diagnostic
Information ②
<readingSession>Radiologist Diagnostic
Information ③
<readingSession>Radiologist Diagnostic
Information ④
...
<servicingRadiologistID>Service Radiology Specialist Number
<unblindedReadNodule>Nodule information
<nonNodule>Non-nodule information
<noduleID>Nodule number
<characteristics>
<roi>Nodule contour
<imageZposition>
<imageSOP_UID>
<inclusion>Nodules
identification
<edgeMap>Nodal Contour coordinates
<nonNoduleID>
<imageZposition>
<locus>
<imageSOP_UID>
Figure B-1: Structure of Pulmonary Nodules’ XML
140
ABBREVIATIONS
WHO: World Health Organization
DICOM: Digital Imaging and Communications in Medicine
LDA: Linear Discriminant Analysis
KNN: K-Nearest-Neighbour
SVM: Support Vector Machine
LIDC: Lung Image Database Consortium
ELCAP: Early Lung Cancer Action Program
ROI: Region of Interest
CAD: Computer Aided Detection
CT: Computed Tomography
PET: Positron Emission Tomography
MRI: Magnetic Resonance Imaging
FP: False positives
FP/scan: False Positive Per Scan
HU: Hounsfield Unit
GLCM: Gray Level Cooccurrence Matrix
IDM: Inverse Difference Moment
ROC: Receiver Operating Characteristic
141
FROC: Free-response ROC
TCIA: The Cancer Imaging Archive
LIDC-IDRI: Lung Image Database Consortium- Image Database Resource Initiative
GS: Gold standard.
DSC: Dice similarity coefficient
ACCU: Accuracy
OM: Overlap measure
SEN: Sensitivity
SPEC: Specificity
PPV: Positive Predictive Value
RmsD: Root Mean Square Difference of the Distance Between the Segmentation and the
Ground Truth.
AD: Mean Absolute Surface Distance.
GN: Group Number
EN: Element Number
VR: Value Represent
VL: Value Length
VF: Value Field
AS: Age String
142
CS: Code String
DA: DATE
BMP: Bitmap
JPEG: Joint Photographic Experts Group
ASCII: American Standard Code for Information Interchange
XML: Extensible Markup Language
GGO: Ground Glass Opacity
LOLA11: Lobe and Lung Analysis 2011
MGRF: Markov-Gibbs Random Field
GSS: Gaussian Scale Space
PA: Postero-Anterior
SIFT: Scale Invariant Feature Transform
PCA: Principal Component Analysis
EM: Expectation-Maximization
ACACM: Adaptive Crisp Active Contour Method
RBF: Radial Basis Function