deep learning covid-19 features on cxr using limited ...1 deep learning covid-19 features on cxr...

10
1 Deep Learning COVID-19 Features on CXR using Limited Training Data Sets Yujin Oh 1 , Sangjoon Park 1 , and Jong Chul Ye, Fellow, IEEE Abstract—Under the global pandemic of COVID-19, the use of artificial intelligence to analyze chest X-ray (CXR) image for COVID-19 diagnosis and patient triage is becoming important. Unfortunately, due to the emergent nature of the COVID-19 pandemic, a systematic collection of the CXR data set for deep neural network training is difficult. To address this problem, here we propose a patch-based convolutional neural network approach with a relatively small number of trainable parameters for COVID-19 diagnosis. The proposed method is inspired by our statistical analysis of the potential imaging biomarkers of the CXR radiographs. Experimental results show that our method achieves state-of-the-art performance and provides clinically interpretable saliency maps, which are useful for COVID-19 diagnosis and patient triage. Index Terms—COVID-19, Chest X-Ray, Deep Learning, Seg- mentation, Classification, Saliency Map I. I NTRODUCTION C ORONAVIRUS disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS- CoV-2), has become global pandemic in less than four months since it was first reported, reaching a 1.8 million confirmed cases and 109,000 death as of April 12th, 2020. Due to its highly contagious nature and lack of appropriate treatment and vaccines, early detection of COVID-19 becomes increasingly important to prevent further spreading and to flatten the curve for proper allocation of limited medical resources. Currently, reverse transcription polymerase chain reaction (RT-PCR), which detects viral nucleic acid, is the golden standard for COVID-19 diagnosis, but RT-PCR results using nasopharyngeal swabs and throat swabs can be affected by sampling errors and low viral load [1]. Antigen tests may be fast, but have poor sensitivity. Since most COVID-19 infected patients were diagnosed with pneumonia, radiological examinations may be useful for diagnosis and assessment of disease progress. Chest computed tomography (CT) screening on initial patient presentation showed outperforming sensitivity to RT-PCR [2] and even confirmed COVID-19 infection on negative or weakly-positive RT-PCR cases [1]. Accordingly, recent COVID-19 radiological literature are primarily focused on CT findings [2], [3]. How- ever, as the prevalence of COVID-19 increases, the routine use of CT places a huge burden on radiology departments and potential of infection of the CT suites; so the need to recognize COVID-19 features on chest X-ray (CXR) is increasing. 1 : Co-first authors with equal contribution. Authors are with the De- partment of Bio and Brain Engineering, Korea Advanced Institute of Sci- ence and Technology (KAIST), Daejeon 34141, Republic of Korea (E-mail: {yujin.oh,depecher,jong.ye}@kaist.ac.kr). Common chest X-ray findings mirror those described by CT such as bilateral, peripheral, consolidation and/or ground glass opacities [2], [3]. Specifically, Wong et al [4] described frequent chest X-ray (CXR) appearances on COVID-19. Un- fortunately, it is reported that chest X-ray findings have a lower sensitivity than initial RT-PCR testing (69% versus 91%, re- spectively) [4]. Despite this low sensitivity, CXR abnormalities were detectable in 9% of patients whose initial RT-PCR was negative. As the COVID-19 pandemic threatens to overwhelm healthcare systems worldwide, if the diagnostic performance with CXR is improved, CXR may be considered as a tool for identifying COVID-19. Even if CXR cannot completely replace the RT-PCR, the indication of pneumonia is a clinical manifestation of patient at higher risk requiring hospitalisation, so CXR may can be used for triage, determining the priority of patients’ treatments to help saturated healthcare system in the pandemic situation. Accordingly, deep learning (DL) approaches on chest X- ray for COVID-19 classification have been actively explored [5]–[11]. Especially, Wang et al [5] proposed an open source deep convolutional neural network platform called COVID- Net that is tailored for the detection of COVID-19 cases from chest radiography images. They claimed that COVID- Net can achieve good sensitivity for COVID-19 cases with 80% sensitivity. Inspired by this early success, in this paper we aim to further investigate deep convolutional neural network and evaluate its feasibility for COVID-19 diagnosis. Unfortunately, under the current public health emergency, it is difficult to collect large set of well-curated data for training neural networks. Therefore, one of the main focuses of this paper is to develop a neural network architecture that is suitable for training with limited training data set, which can still produce radiologically interpretable results. In particular, since most frequently observed distribution patterns of COVID-19 in CXR are bilateral involvement, peripheral distribution and ground- glass opacification (GGO) [12], a properly designed neural network should reflect such radiological findings. To achieve this goal, we first investigate several imag- ing biomarkers that are often used in CXR analysis, such as lung area intensity distribution, the cardio-thoracic ratio, etc. Our analysis found that there are statistically significant differences in the patch-wise intensity distribution, which is well-correlated with the radiological findings of the local- ized intensity variations in COVID-19 CXR. This findings lead us to propose a novel patch-based deep neural network architecture with random patch cropping, from which the final classification result are obtained by majority voting arXiv:2004.05758v1 [eess.IV] 13 Apr 2020

Upload: others

Post on 31-Jul-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deep Learning COVID-19 Features on CXR using Limited ...1 Deep Learning COVID-19 Features on CXR using Limited Training Data Sets Yujin Oh 1, Sangjoon Park , and Jong Chul Ye, Fellow,

1

Deep Learning COVID-19 Features on CXR usingLimited Training Data Sets

Yujin Oh1, Sangjoon Park1, and Jong Chul Ye, Fellow, IEEE

Abstract—Under the global pandemic of COVID-19, the useof artificial intelligence to analyze chest X-ray (CXR) image forCOVID-19 diagnosis and patient triage is becoming important.Unfortunately, due to the emergent nature of the COVID-19pandemic, a systematic collection of the CXR data set for deepneural network training is difficult. To address this problem,here we propose a patch-based convolutional neural networkapproach with a relatively small number of trainable parametersfor COVID-19 diagnosis. The proposed method is inspired by ourstatistical analysis of the potential imaging biomarkers of theCXR radiographs. Experimental results show that our methodachieves state-of-the-art performance and provides clinicallyinterpretable saliency maps, which are useful for COVID-19diagnosis and patient triage.

Index Terms—COVID-19, Chest X-Ray, Deep Learning, Seg-mentation, Classification, Saliency Map

I. INTRODUCTION

CORONAVIRUS disease 2019 (COVID-19), caused bysevere acute respiratory syndrome coronavirus 2 (SARS-

CoV-2), has become global pandemic in less than four monthssince it was first reported, reaching a 1.8 million confirmedcases and 109,000 death as of April 12th, 2020. Due to itshighly contagious nature and lack of appropriate treatment andvaccines, early detection of COVID-19 becomes increasinglyimportant to prevent further spreading and to flatten the curvefor proper allocation of limited medical resources.

Currently, reverse transcription polymerase chain reaction(RT-PCR), which detects viral nucleic acid, is the goldenstandard for COVID-19 diagnosis, but RT-PCR results usingnasopharyngeal swabs and throat swabs can be affected bysampling errors and low viral load [1]. Antigen tests may befast, but have poor sensitivity.

Since most COVID-19 infected patients were diagnosedwith pneumonia, radiological examinations may be useful fordiagnosis and assessment of disease progress. Chest computedtomography (CT) screening on initial patient presentationshowed outperforming sensitivity to RT-PCR [2] and evenconfirmed COVID-19 infection on negative or weakly-positiveRT-PCR cases [1]. Accordingly, recent COVID-19 radiologicalliterature are primarily focused on CT findings [2], [3]. How-ever, as the prevalence of COVID-19 increases, the routineuse of CT places a huge burden on radiology departments andpotential of infection of the CT suites; so the need to recognizeCOVID-19 features on chest X-ray (CXR) is increasing.

1: Co-first authors with equal contribution. Authors are with the De-partment of Bio and Brain Engineering, Korea Advanced Institute of Sci-ence and Technology (KAIST), Daejeon 34141, Republic of Korea (E-mail:{yujin.oh,depecher,jong.ye}@kaist.ac.kr).

Common chest X-ray findings mirror those described byCT such as bilateral, peripheral, consolidation and/or groundglass opacities [2], [3]. Specifically, Wong et al [4] describedfrequent chest X-ray (CXR) appearances on COVID-19. Un-fortunately, it is reported that chest X-ray findings have a lowersensitivity than initial RT-PCR testing (69% versus 91%, re-spectively) [4]. Despite this low sensitivity, CXR abnormalitieswere detectable in 9% of patients whose initial RT-PCR wasnegative. As the COVID-19 pandemic threatens to overwhelmhealthcare systems worldwide, if the diagnostic performancewith CXR is improved, CXR may be considered as a toolfor identifying COVID-19. Even if CXR cannot completelyreplace the RT-PCR, the indication of pneumonia is a clinicalmanifestation of patient at higher risk requiring hospitalisation,so CXR may can be used for triage, determining the priorityof patients’ treatments to help saturated healthcare system inthe pandemic situation.

Accordingly, deep learning (DL) approaches on chest X-ray for COVID-19 classification have been actively explored[5]–[11]. Especially, Wang et al [5] proposed an open sourcedeep convolutional neural network platform called COVID-Net that is tailored for the detection of COVID-19 casesfrom chest radiography images. They claimed that COVID-Net can achieve good sensitivity for COVID-19 cases with80% sensitivity.

Inspired by this early success, in this paper we aim tofurther investigate deep convolutional neural network andevaluate its feasibility for COVID-19 diagnosis. Unfortunately,under the current public health emergency, it is difficultto collect large set of well-curated data for training neuralnetworks. Therefore, one of the main focuses of this paper isto develop a neural network architecture that is suitable fortraining with limited training data set, which can still produceradiologically interpretable results. In particular, since mostfrequently observed distribution patterns of COVID-19 in CXRare bilateral involvement, peripheral distribution and ground-glass opacification (GGO) [12], a properly designed neuralnetwork should reflect such radiological findings.

To achieve this goal, we first investigate several imag-ing biomarkers that are often used in CXR analysis, suchas lung area intensity distribution, the cardio-thoracic ratio,etc. Our analysis found that there are statistically significantdifferences in the patch-wise intensity distribution, which iswell-correlated with the radiological findings of the local-ized intensity variations in COVID-19 CXR. This findingslead us to propose a novel patch-based deep neural networkarchitecture with random patch cropping, from which thefinal classification result are obtained by majority voting

arX

iv:2

004.

0575

8v1

[ee

ss.I

V]

13

Apr

202

0

Page 2: Deep Learning COVID-19 Features on CXR using Limited ...1 Deep Learning COVID-19 Features on CXR using Limited Training Data Sets Yujin Oh 1, Sangjoon Park , and Jong Chul Ye, Fellow,

2

Fig. 1: Overall architecture of the proposed neural network approach: (a) Segmentation network, and (b) Classification network

from inference results at multiple patch locations. One of theimportant advantages of the proposed method is that due tothe patch training the network complexity is relative smalland each patch can be used as training data set, so that evenwith the limited data set the neural network can be trainedefficiently without overfitting. Moreover, the resulting classactivation map clearly show the interpretable results that arewell correlated with radiological findings. We demonstrate thatthe proposed network architecture provides better sensitivityand interpretability, compared to the existing COVID-Net [5]with the same data set.

II. PROPOSED NETWORK ARCHITECTURE

The overall algorithmic framework is given in Fig. 1. TheCXR images are first processed for data normalization, afterwhich the processed data are fed into a segmentation network,from which lung areas can be extracted as shown in Fig. 1(a).From the segmented lung area, classification network is usedto classify the corresponding diseases using a patch-by-patchtraining and inference, after which the final decision are madebased on the majority voting as shown in Fig. 1(b). In thefollowing, each network is described in detail.

A. Segmentation network

Our segmentation network aims to extract lung and heartcontour from the chest radiography images. We adopted an

extended fully convolutional DenseNets to perform semanticsegmentation [13]. The training objective is

argminΘ

L(Θ) (1)

where L(Θ) is the cross entropy loss of multi-categoricalsemantic segmentation and Θ denotes the network parameterset, which is composed of filter kernel weights and biases.Specifically, L(Θ) is defined as

L(Θ) = −∑c

∑j

λc1(yj = c)log(fΘ(xj)) (2)

where 1(·) is the indicator function, f(xj) denotes the softmaxprobability of the j-th pixel in a CXR images x, and yj denotesthe ground truth label. c denotes class category, i.e., c ∈{background, heart, left lung, right lung}. λc denotes weightsgiven to each class category.

CXR images from different dataset resources may induceheterogeneity in their bits depth, compression type, image size,acquisition condition, scanning protocol, postprocessing, etc.Therefore, we develop a universal preprocessing step to ensureuniform intensity histogram throughout the entire dataset fordata normalization. The preprocessing steps are as follows:

1) Histogram equalization (gray level = [0, 255.0])2) Gamma correction (γ = 0.5)3) Image resize (height, width = [256, 256])Then, we trained FC-DenseNet103 [13] as our backbone

segmentation network architecture. Network parameters wereinitialized by random distribution. We applied Adam optimizer

Page 3: Deep Learning COVID-19 Features on CXR using Limited ...1 Deep Learning COVID-19 Features on CXR using Limited Training Data Sets Yujin Oh 1, Sangjoon Park , and Jong Chul Ye, Fellow,

3

[14] with an initial learning rate of 0.0001. Whenever trainingloss did not improved by certain criterion, the learning ratewas reduced by factor 10. We adopted early stopping strategybased on validation performance. Batch size was optimized to2. We implemented the network using PyTorch library [15].

B. Classification networkThe classification network aims to classify the chest X-ray

images according to the types of disease. Since it is known thatoverfitting can occur when using an overly complex model forsmall number of data, we used ResNet-18 as the backbone ofour classification algorithm to reduce the model complexity.

The labels were divided into four classes: normal, bacterialpneumonia, tuberculosis (TB), and viral pneumonia whichincludes the pneumonia caused by COVID-19 infection. Weassigned the same class for viral pneumonia from other viruses(e.g. SARS-cov or MERS-cov) with COVID-19, since itis reported that they have similar radiologic features evenchallenging for the experienced radiologists [16]. Rather, weconcentrated on more feasible work such as distinguishingbacterial pneumonia or tuberculosis from viral pneumonia,which show considerable differences in the radiologic featuresand are still useful for patient triage.

The pre-processed images were first masked with the lungmasks from the segmentation networks, which are then fed intoa classification network. Classification network were imple-mented in two different versions: global and local approaches.In the global approach, the masked images were resized to224× 224, which were fed into the network. This approach isfocusing on the global appearance of the CXR data, and wasused as a baseline network for comparison. In fact, many ofthe existing researches employs similar procedure [5]–[8].

In the local approach, which is our proposed method,the masked images were cropped randomly with a size of224 × 224, and resulting patches were used as the networkinputs as shown in Fig. 1(b). In contrast to the global approach,the original size CXR images are used for classificationnetwork to fully reflect the original pixel distribution, so thesegmentation mask from Fig. 1(a) are upsampled to matchthe original CXR image size. To avoid cropping the patchfrom the empty area of the masked image, the centers ofpatches were randomly selected within the lung areas. Duringthe inference, N -number of patches were randomly acquiredfor each image to represent the entire attribute of the wholeimage. The number N was chosen to sufficiently cover alllung pixels multiple times. Then, each patch was fed into thenetwork to generate network output, and among N networkoutput the final decision was made based on majority voting,i.e. the most frequently declared class were regarded as finaloutput as depicted in Fig. 1(b). In this experiments, the numberof random patches N was set to 100, which means that 100patches were generated randomly for one whole image formajority voting.

For network training, ImageNet pre-trained parameters areused for network weight initialization, after which the networkwas trained using the CXR data. We found that this transferlearning scheme makes the training stably even when the avail-able dataset is limited. As for optimization algorithm, Adam

optimizer [14] with learning rate of 0.00001 was applied. Thenetwork were trained for 100 epochs, but we adopted earlystopping strategy based on validation performance metrics.The batch size of 16 was used. We applied weight decay andL1 regularization to prevent overfitting problem. The classifi-cation network was also implemented by Pytorch library.

III. METHOD

A. Dataset

We used public CXR datasets, whose characteristics aresummarized in Table I and Table II. In particular, the data inTable I are used for training and validation of the segmenta-tion networks, since the ground-truth segmentation masks areavailable. The curated data in Table II are from some of thedata in Table I as well as other COVID-19 resources, whichwere used for training, validation, and test for the classificationnetwork. More detailed descriptions of the dataset are follows.

1) Segmentation network dataset: The JSRT dataset wasreleased by the Japanese Society of Radiological Technology(JSRT) [17]. Total 247 chest posteroanterior (PA) radiographswere collected from 14 institutions including normal andlung nodule cases. Corresponding segmentation masks werecollected from the SCR database [18]. The JSRT/SCR datasetwere randomly split into training (80%) and validation (20%).For cross-database validation purpose, we used another publicCXR dataset. U.S. National Library of Medicine (USNLM)collected Montgomery Country (MC) dataset [19]. Total 138chest PA radiographs were collected including normal, tuber-culosis cases and corresponding lung segmentation mask.

TABLE I: Segmentation dataset resources

Dataset Class # Bits MaskLung Heart

TrainingJSRT/SCR [17] [18] Normal/Nodule 197 12 O OValidationJSRT/SCR Normal/Nodule 50 12 O ONLM(MG) [19] Normal 73 8 O -

TABLE II: Classification data set resources

Dataset Class # Bits MaskLung Heart

JSRT/SCR Normal 20 12 O ONLM(MG) Normal 73 8 O -CoronaHack [20] Normal 98 24 - -NLM(MG) Tuberculosis 57 8 O -CoronaHack Pneumonia (Bacteria) 21 24 - -

Cohen et al [21]Pneumonia (Bacteria) 33 24 - -Pneumonia (Virus) 20 24 - -Pneumonia (COVID-19) 180 24 - -

2) Classification dataset: The dataset resources for theclassification network is described in Table II. Specifically,for normal cases, the JSRT dataset and the NLM dataset fromthe validation dataset were included. For comparing COVID-19 from normal and different lung diseases, data were alsocollected from different sources [20], [21], including additionalnormal cases. These datasets were selected because they arefully accessible to any research group, and they provide thelabels with detailed diagnosis of disease. This enables more

Page 4: Deep Learning COVID-19 Features on CXR using Limited ...1 Deep Learning COVID-19 Features on CXR using Limited Training Data Sets Yujin Oh 1, Sangjoon Park , and Jong Chul Ye, Fellow,

4

specific classification of pneumonia into bacterial and viralpneumonia, which should be classified separately because oftheir distinct clinical and radiologic differences.

In the collected data from the public dataset [20], over 80%was pediatric CXR from Guangzhou Women and ChildrensMedical Center [22]. Therefore, to avoid the network to learnbiased features from age-related characteristics, we excludedpediatric CXR images. To the best of our knowledge, thisis the first trial utilizing CXR radiography with uniform agedistribution to classify COVID-19 using DL approaches.

TABLE III: Disease class summary of the data set

Dataset Normal Bacterial Tuberculosis Viral COVID-19 TotalTraining 134 39 41 14 126 354

Validation 19 5 5 2 18 49Test 38 10 11 4 36 99Total 191 54 57 20 180 502

Total dataset was curated into five classes; normal, tu-berculosis, bacterial pneumoia, viral pneumonia, COVID-19pneumonia. The numbers of each disease class from thedata set are summarized in Table III. Specifically, a totalof 180 radiography images of 118 subjects from COVID-19image data collection were included. Moreover, a total of 322chest radiography images from different subjects were used,which include 191, 54, and 20 images for normal, bacterialpneumonia, and viral pneumonia (not including COVID-19),respectively. The combined dataset were randomly splited intotrain, validation, and test sets with the ratio of 0.7, 0.1, and0.2.

3) Dataset for comparison with COVID-Net: We prepareda separate dataset to compare our method with existing state-of-the art (SOTA) algorithm called COVID-Net [5]. COVID-19 image data collection was combined with RSNA Pneu-monia Detection Challenge dataset as described in [5] for afair comparison between our method and COVID-Net. Thereason we separately train our network with the COVID-Net data set is that RSNA Pneumonia Detection Challengedataset provide only the information regarding the presenceof pneumonia, rather than the detailed diagnosis of disease,so that the labels were divided into only three categoriesincluding normal, pneumonia, and COVID-19 as in TableIV. More specifically, there were 8,851 normal and 6,012pneumonia chest radiography images from 13,645 patientsin RSNA Pneumonia Detection Challenge dataset, and theseimages were combined with COVID-19 image data collectionto compose a total dataset. Among these, 100 normal, 100pneumonia, and 10 COVID-19 images were randomly selectedfor validation and test set, respecitvely as in [5]. Althoughwe believe our categorization into normal, bacterial, TB, andviral+COVID-19 cases is more correlated with the radiologicalfindings and practically useful in clinical environment [16],we conducted this additional comparison experiments with thedata set in Table IV to demonstrate that our algorithm providescompetitive performance in the same experiment set-up.

B. Statistical Analysis of Potential CXR COVID-19 markersThe following standard biomarkers from CXR image anal-

ysis are investigated.

TABLE IV: Dataset for comparison with COVID-Net

Dataset Normal Pneumonia COVID-19 TotalTraining 8651 5812 160 14603

Validation 100 100 10 210Test 100 100 10 210Total 8851 6012 180 15043

• Lung morphology: Morphological structures of the seg-mented lung area as illustrated in Fig. 2(b) was evaluatedthroughout different classes.

• Mean lung intensity: From the segmented lung area, wecalculated mean value of the pixel intensity within thelung area as shown in Fig. 2(c).

• Standard deviation of lung intensity: From the intensityhistogram of lung area pixels, we calculated one standarddeviation which is indicated as the black double-headedarrow in Fig.2(c).

• Cardiothoracic Ratio (CTR) : CTR can be calculated bydividing the maximal transverse cardiac diameter by themaximal internal thoracic diameter annotated repectivelyas red and blue double-headed arrow arrow in Fig. 2(a).Cardiothoracic Ratio (CTR) is a widely used marker todiagnosis cardiomegaly [23], [24]. We hypothesized thatif cardiothoracic boundary become blurred by roundedopacities or consolidation in COVID-19 CXR [2]–[4],distinct off-average CTR value can be utilized as anabnormality alarm.

Statistical analysis for the potential biomarkers was per-formed using MATLAB 2015a (Mathworks, Natick). Kol-mogorov Smirnov test was used to evaluate the normal dis-tribution of marker candidates. For non-normally distributedvariables, Wilcoxon signed rank test was used to compare seg-mentation performance with identical data size, and Wilcoxonrank sum test was used to compare COVID-19 marker can-didates to those of other classes with different data sizes.Statistical significance (SS) levels were indicated as asterisks;* for p <0.05, ** for p <0.01 and *** for p <0.001.

Fig. 2: (a) Segmentation result. Each lung and heart segmentare overlapped on CXR coloring in blue and red, respectively.Green line represent the ground truth. (b) Extracted lung areas,and (c) corresponding lung area pixel intensity histogram.

C. Classification performance metrics

The performance of the classification methods was evalu-ated using the confusion matrix. From the confusion matrix,

Page 5: Deep Learning COVID-19 Features on CXR using Limited ...1 Deep Learning COVID-19 Features on CXR using Limited Training Data Sets Yujin Oh 1, Sangjoon Park , and Jong Chul Ye, Fellow,

5

true positive (TP), true negative (TN), false positive (FP), andfalse negative (FN) values were obtained, and 5 metrics forperformance evaluation were calculated as below:

1) Accuracy = (TN + TP )/(TN + TP + FN + FP )2) Precision = TP/(TP + FP )3) Recall = TP/(TP + FN)4) F1 score = 2(Precision×Recall)/(Precision+Recall)5) Specificity = TN/(TN + FP )

Among these, the F1 score was used as the evaluation metricfor early stopping. The overall metric scores of the algorithmwere calculated by averaging each metrics for multiple classes.

D. Saliency map visualization

In addition, we investigate the interpretability of our ap-proach by visualizing GradCAM-based saliency map [25].In the global approach, GradCAM-based saliency map wasobtained in layer 4 of ResNet-18 for a given chest radiographyimage. In local patch-based approach, the GradCAM-basedsaliency maps were obtained for each patch similar to globalapproach, and then the resulting saliency maps from all patchesare averaged to reconstruct a single saliency map for a givenimage.

IV. EXPERIMENTAL RESULTS

A. Segmentation performance on cross-database

Segmentation performance of anatomical structure was eval-uated using Jaccard similarity coefficient. Table V presentsthe Jaccard similarity coefficient of each contour on thevalidation dataset. The results confirmed comparable accuracyto previous work using the JSRT dataset [26].

TABLE V: CXR segmentation results

PreprocessJaccard Index

Lung SS Heart

JSRT O 0.955±0.015 0.889±0.054NLM(MG) - 0.932±0.022

***NLM(MG) O 0.943±0.013

To evaluate segmentation performance on cross-database,we tested either original or preprocessed images of the NLMdataset as inputs. The result shows that our universal prepro-cessing step for data normalization contributes to the process-ing of cross-database with statistically significant improvementon segmentation accuracy (Jaccard similarity from 0.932 to0.943, p < 0.001). This result indicates that preprocessing iscrucial factor to ensure segmentation performance in cross-database.

B. Morphological analysis of lung area

To analyze morphological characteristics in the segmentedlung area, a representative CXR radiograph for each classwas selected for visual evaluation. Lung contour of each classshowed differentiable features and showed mild tendency. Innormal and tuberculosis cases (the first and the second row ofFig. 3, respectively), overall lung and heart contour were well-segmented. In the third row of Fig. 3, which corresponds to a

Fig. 3: Preprocessed images, corresponding segmentation re-sults, and the extracted lung contours are shown along withthe column-axis. Each row depicts different categorical class.

bacterial infection case, the segmented lung area was deformeddue to wide spread opacity of bacterial pneumonia, and boththe right cardiac and thoracic borders were lost. In overallbacterial infection cases, similar findings were occasionallyobserved which caused degraded segmentation performance.In the fourth row of Fig. 3, viral infection caused bilateralconsolidations [27], thus partial deformation of lung area wasobserved. In the COVID-19 case of the fifth row of Fig.3, despite the bi-basal infiltrations [28], lung area was fullysegmented. In overall cases of the viral and the COVID-19 classes, lung areas were either normally or partially-imcompletely segmented. Based on these morphological find-ings in segmented lung area, we further analyzed the potentialCOVID-19 biomarkers.

C. Statistical significancy of potential COVID-19 bio-markers

We hypothesized that if segmentation performance mainlydepends on pixel intensity, CXR appearance influenced byconsolidations or infiltration of COVID-19 may be reflected insegmentation result as shown in Fig. 3. Thus, intensity-related

Page 6: Deep Learning COVID-19 Features on CXR using Limited ...1 Deep Learning COVID-19 Features on CXR using Limited Training Data Sets Yujin Oh 1, Sangjoon Park , and Jong Chul Ye, Fellow,

6

(a)

(b)

(c)

Fig. 4: Scatter plot (Left) and corresponding mean values withone standard deviation error bars (Right). All the parametervalues are normalised to an arbitrary unit.

COVID-19 marker candidates were extracted and comparedbetween other classes.

1) Lung areas intensity: Mean pixel intensity of each lungarea is shown in the scatter plot of Fig. 4(a). COVID-19 casesshowed lower mean intensity compared to other cases withstatistical significance level (p <0.001 for normal and bacte-rial, p <0.01 for TB). Table VI describes the correspondingstatistical result. Despite the statistical significance, the scatterplot showed broad overlap between several classes.

2) Lung areas intensity variance: Standard deviation ofpixel intensity of each lung area are scattered in plot inFig. 4(b). For both the COVID-19 and the viral cases, thevariance values were higher than other classes with statisticalsignificance (p <0.001 for all). Table VII describes the

corresponding statistical result.To investigate the effect of scanning protocol on statistics,

we performed additional study by excluding AP Supine ra-diographs from entire dataset with documented patient infor-mation. Recall that AP Supine protocol is an alternative tostandard PA or AP protocol depending on patient condition.Since AP Supine protocol is not common in normal cases,supine scanning with different acquisition condition may havepotential for considerable heterogeneity in data distribution,causing biased results in statistical analysis, so we investigatedthis issue. The result shown in Table VIII compared to TableVII showed minor difference in both the COVID-19 and theviral and cases. The result indicates that for both the COVID-19 and viral classes, the highly intensity-variable characteristicin the lung area is invariant to scaning protocol.

3) Cardiothoracic ratio: CTR values of each lung area isscattered in Fig. 4(c). Despite there exist statistical differencesbetween the COVID-19 cases to other classes (P<0.001 fornormal and TB, P<0.05 for Bacteria), the scatter plot showedbroad overlap between several classes. Table IX describes thecorresponding statistical result.

TABLE VI: Lung areas intensity statistics

Mean STDStatistical significance

Normal TB Bacteria Viral

Normal 0.540 0.055Tuberculosis 0.523 0.043 *Bacteria 0.558 0.054 - ***Viral 0.509 0.047 ** - ***COVID-19 0.506 0.051 *** ** *** -

TABLE VII: Lung areas intensity variance statistics

Mean STDStatistical significance

Normal TB Bacteria Viral

Normal 0.139 0.017Tuberculosis 0.136 0.016 -Bacteria 0.145 0.019 * **Viral 0.165 0.022 *** *** ***COVID-19 0.163 0.022 *** *** *** -

TABLE VIII: Lung areas intensity variance statistics by ex-cluding AP supine radiographs

Mean STDStatistical significance

Normal TB Bacteria Viral

Normal 0.139 0.017Tuberculosis 0.136 0.016 -Bacteria 0.143 0.020 - -Viral 0.163 0.025 *** *** **COVID-19 0.161 0.022 *** *** *** -

Based on the statistical analysis of potential bio-markercandidates, we found that intensity distribution pattern withinthe lung area may be most effective in the diagnosis, whichhighly reflects the reported chest X-ray (CXR) appearancesof COVID-19, i.e., multi-focally distributed consolidation andGGO in specific region such as peripheral and lower zone[2]–[4].

Page 7: Deep Learning COVID-19 Features on CXR using Limited ...1 Deep Learning COVID-19 Features on CXR using Limited Training Data Sets Yujin Oh 1, Sangjoon Park , and Jong Chul Ye, Fellow,

7

TABLE IX: Cardiothoracic ratio statistics

Mean STDStatistical significance

Normal TB Bacteria Viral

Normal 0.446 0.051Tuberculosis 0.476 0.078 *Bacteria 0.472 0.074 - -Viral 0.502 0.064 *** * -COVID-19 0.499 0.086 *** *** * -

However, care should be taken, since not only the locallyconcentrated multiple opacities can cause uneven intensitydistribution throughout entire lung area, but also differenttexture distribution within CXR may cause the similar intensityvariations. For example, multi-focally distributed consolidationfrom COVID-19 could make the intensity variance differenti-ating factor from other classes, but also bacterial pneumoniagenerates opacity as well, whose feature may lead to the simi-lar intensity distributions as results of different characteristicsof opacity spreading pattern.

To decouple these compounding effects, we further inves-tigated the local and global intensity distribution. For thecorrectly classified patches from our classification network, wecomputed their mean intensity and standard deviation (STD)values. We defined distribution of mean intensity of each patchas the inter-patch intensity distribution (Fig. 5(a)) and the STDof each patch as Intra-patch intensity distribution (Fig. 5(b)).As shown in Fig. 5(a), the inter-patch intensity distributionof the unified COVID-19 and viral class showed distict lowerintensity values (p <0.001 for all) to other classes and highlyintensity-variant characteristics which can be represented asthe large error bar. This result is in accordance with theresult of lung area intensity and intensity variance (Fig. 4(a),(b)). Intra-patch intensity distribution, however, showed nodifference compared to the normal class (p >0.05). Fromthese intra- and inter-patch intensity distribution results, wecan infer that intra-patch variance, which represents localtexture information, was not crucially informative, whereasthe globally distributed multi-focal intensity change may bean important discriminating feature for COVID-19 diagnosis,which is strongly correlated with the radiological findings.

One common finding among the marker candidates was nodifference between the COVID-19 and the viral case (p >0.05for all the markers), which is also correlated with radiologicalfindings [16]. Therefore, in the classification network, theCOVID-19 and viral classes were integrated into one class.

D. Classification performance

The classification performances of the proposed methodare provided in Table X. The confusion matrices for the (a)global method and the (b) local patch-based method are shownin Fig. 6. The proposed local patch-based approach showedconsistently better performance than global approach in allmetrics. Overall, the local patch-based method showed theacceptable performance even with the small dataset, while theglobal method did not.

(a)

(b)

Fig. 5: Scatter plot (Left) and corresponding mean values withone standard deviation error bars. Each scatter depicts a patchwhich was correctly classified to the ground truth label. Allthe parameter values were normalised to an arbitrary unit.Statistically differentiable classes from the COVID-19 andviral cases (p <0.001) are marked at each error bar.

TABLE X: Classification results from the global approach andthe proposed patch-based classification network.

Methods Accuracy Precision Recall F1 score SpecificityGlobal approach 70.7 60.6 60.1 59.3 89.7Local approach 88.9 83.4 85.9 84.4 96.4

E. Interpretability using saliency map

Fig. 7 and Fig. 8 illustrate the examples of visualization ofthe network using GradCAM-based saliency map. Specifically,in Fig. 7, our local patch-based approach exhibited the areas ofimportance in CXR of COVID-19 patient more precisely thanthe global approach. The global approach using whole imageshowed the limitation that it concentrate mainly on a broadarea of the image, which may not be the area of importancein the view point of clinical experts.

When comparing the saliency maps from an image of nor-mal person and that of COVID-19, the significant differenceswere noticeable as in Fig. 8. Consistent with radiologicalintuition, the activation levels for the normal class were evenlydistributed throughout the lungs in the normal image. On the

Page 8: Deep Learning COVID-19 Features on CXR using Limited ...1 Deep Learning COVID-19 Features on CXR using Limited Training Data Sets Yujin Oh 1, Sangjoon Park , and Jong Chul Ye, Fellow,

8

Fig. 6: Confusion matrices for the (a) global approach, and(b) the proposed local patch-based approach.

Fig. 7: Example of GradCAM-based saliency map for aCOVID-19 patient. (a) Global approach, and (b) the proposedpatch-based approach.

other hand, the image of COVID-19 patient showed increasedCOVID-19 class activation map mainly in periphery andpatch consolidations, which are in consistent with the findingsreported by clinical experts. Furthermore, our saliency mapclearly correlated with the patch consolidation area identifiedby radiologists which are marked by the arrows in Fig. 7 andFig. 8.

Fig. 8: Comparison of saliency map for a (a) normal and a (b)COVID-19 patient.

V. DISCUSSION

A. COVID-19 Features on CXR

In the diagnosis of COVID-19, other diseases mimick-ing COVID-19 pneumonia should be differentiated, includingcommunity-acquired pneumonia such as streptococcus pneu-monia, mycoplasma and chlamydia related pneumonia, andother coronavirus infections.

In radiological literature, most frequently observed distribu-tion patterns of COVID-19 are bilateral involvement, periph-eral distribution and ground-glass opacification (GGO) [12].Wong et al [4] found that consolidation was the most commonfinding (30/64, 47%), followed by GGO (21/64, 33%). CXRabnormalities had a peripheral (26/64, 41%) and lower zonedistribution (32/64, 50%) with bilateral involvement (32/64,50%), whereas pleural effusion was uncommon (2/64, 3%).

Our statistical analysis of the intensity distribution clearlyshowed that the globally distributed localized intensity vari-ation is a discriminatory factor for COVID-19 CXR images,which was also confirmed with our saliency map. This clearlyconfirmed that the proposed method clearly reflects the radi-ological findings.

B. Feasibility as a ’triage’ for COVID-19

In pandemic situation of infectious disease, the distributionof medical resources is a matter of the greatest importance. AsCOVID-19 is spreading rapidly and surpassing the capacityof medical system in many countries, it is necessary to makereasonable decision to distribute the limited resources basedon the ’triage’, which determine the needs and urgency foreach patients. In this respect, the disease such as bacterialpneumonia or tuberculosis as well as normal condition can beexcluded primarily, to preserve limited medical resources suchas RT-PCR or CT only for those who suspected to be infectedwith COVID-19. The detailed triage workflow that utilizes theproposed algorithm is described in Fig. 9. Specifically, ourneural network is trained to classify other viral and COVID-19 in the same class. This is not only because it is stronglycorrelated with the radiological findings [16], but also usefulas a triage. More specifically, by excluding normal, bacterialpneumonia, and TB at the early stage, we can use RT-PCR

Page 9: Deep Learning COVID-19 Features on CXR using Limited ...1 Deep Learning COVID-19 Features on CXR using Limited Training Data Sets Yujin Oh 1, Sangjoon Park , and Jong Chul Ye, Fellow,

9

or CT for only those patients classified as other virus andCOVID-19 cases for final diagnosis. By doing this procedure,we can save limited medical resources such as RT-PCR or CTto those patients whose diagnosis by CXR is difficult even byradiologists.

Fig. 9: Potential triage workflow that utilizes the proposedalgorithm in the diagnosis of COVID-19 patient.

C. Training stability

In order to investigate the origin of the apparent advantagesof using local patch-based training over the global approach,we investigate the training dynamics to investigate the pres-ence of overfitting. This is especially important, given thatthe training data is limited due to the difficulty of systematicCXR data collection for COVID-19 cases under current publichealth emergency.

Fig. 10 shows the curves for accuracy and F1 score of(a) the global approach and (b) the proposed local patch-based approach for each epoch. Note that both approachesuse the same number of weight parameters. Still, thanks to theincreasing training data set from the random patch croppingacross all image area, our local patch-based algorithm did notshowed any sign of overfitting even with the small numbersof training data, while the global approach showed significantoverfitting problem. This clearly indicates that with the limiteddata set the patch-based neural network training may be apromising direction.

D. Comparison with COVID-Net

Since the proposed patch-based neural network architectureis designed by considering limited training data set, we investi-gated any potential performance loss in comparison with otherstate-of-the art (SOTA) deep learning approach that has beendeveloped without such consideration. Specifically, COVID-Net [5] is one of the most successful approaches in COVID-19diagnosis, so we chose it as the SOTA method.

The comparison between our method and COVID-Net isshown in Table XI. With the same dataset, our methodshowed overall accuracy of 91.9 % which is comparable tothat of 92.4 % for COVID-Net. Furthermore, our methodprovided significantly improved sensitivity to COVID-19 cases

Fig. 10: Training and validation accuracy and F1-score foreach epoch. (a) Global approach, and (b) the proposed patch-based approach.

compared to the COVID-Net. In addition, it is also remarkablethat our method uses only about 10% number of parameters(11.2 M) compared to that of COVID-Net (116.6 M), becausethe proposed algorithm is developed based on less complexnetwork architecture without increasing the complexity of themodel. This may bring the advantages not only in the aspectof computational time but also in the aspect of performanceand stability with small-sized dataset.

TABLE XI: Comparison of our method with COVID-Net

Methods Sensitivity PrecisionNormal Pneumonia COVID-19 Normal Pneumonia COVID-19

COVID-Net 95 91 80 91.3 93.8 88.9Proposed 90 93 100 95.7 90.3 76.9

VI. CONCLUSION

In the rapidly evolving global pandemic of COVID-19,the use of CXR for COVID-19 diagnosis or triage for pa-tient management has become an important issue to preservelimited medical resources and prevent further spreading ofthe virus. However, the current diagnostic performance withCXR is not sufficient for routine clinical use, so the needof artificial intelligence to improve diagnostic performance of

Page 10: Deep Learning COVID-19 Features on CXR using Limited ...1 Deep Learning COVID-19 Features on CXR using Limited Training Data Sets Yujin Oh 1, Sangjoon Park , and Jong Chul Ye, Fellow,

10

CXR is increasing. Unfortunately, due to the emergent natureof COVID-19 global pandemic, systematic collection of thelarge data set for deep neural network training is difficult.

To address this problem, we investigated potential biomark-ers in the CXR and found the globally distributed localized in-tensity variation has the discriminating feature for the COVID-19. Based on this finding, we propose a patch-based deepneural network architecture that can be stably trained withsmall data set. Once the neural network was trained, the finaldecision was made based on the majority voting from multiplepatches at random locations within lungs. Our experimentalresults demonstrated that the proposed network was trainedstably with small data set, provided comparative results withthe SOTA method, and generated interpretable saliency mapsthat are strongly correlated with the radiological findings.

ACKNOWLEDGMENT

We would like to thank the doctors and medical profession-als from Korea and all around the world who have dedicatedtheir time and efforts to treat COVID-19 patients and protectthe health of citizens during this pandemic. This work wassupported by the National Research Foundation of Korea underGrant NRF-2020R1A2B5B03001980.

REFERENCES

[1] X. Xie, Z. Zhong, W. Zhao, C. Zheng, F. Wang, and J. Liu, “ChestCT for typical 2019-ncov pneumonia: relationship to negative RT-PCRtesting,” Radiology, p. 200343, 2020.

[2] Y. Fang, H. Zhang, J. Xie, M. Lin, L. Ying, P. Pang, and W. Ji,“Sensitivity of chest CT for COVID-19: comparison to RT-PCR,”Radiology, p. 200432, 2020.

[3] T. Ai, Z. Yang, H. Hou, C. Zhan, C. Chen, W. Lv, Q. Tao, Z. Sun, andL. Xia, “Correlation of chest CT and RT-PCR testing in coronavirusdisease 2019 (COVID-19) in China: a report of 1014 cases,” Radiology,p. 200642, 2020.

[4] H. Y. F. Wong, H. Y. S. Lam, A. H.-T. Fong, S. T. Leung, T. W.-Y. Chin,C. S. Y. Lo, M. M.-S. Lui, J. C. Y. Lee, K. W.-H. Chiu, T. Chung et al.,“Frequency and distribution of chest radiographic findings in COVID-19positive patients,” Radiology, p. 201160, 2020.

[5] L. Wang and A. Wong, “COVID-net: A tailored deep convolutionalneural network design for detection of COVID-19 cases from chestradiography images,” arXiv preprint arXiv:2003.09871, 2020.

[6] A. Narin, C. Kaya, and Z. Pamuk, “Automatic detection of coronavirusdisease (COVID-19) using X-ray images and deep convolutional neuralnetworks,” arXiv preprint arXiv:2003.10849, 2020.

[7] I. D. Apostolopoulos and T. A. Mpesiana, “COVID-19: automatic de-tection from x-ray images utilizing transfer learning with convolutionalneural networks,” Physical and Engineering Sciences in Medicine, p. 1,2020.

[8] E. E.-D. Hemdan, M. A. Shouman, and M. E. Karar, “COVIDX-Net: Aframework of deep learning classifiers to diagnose COVID-19 in X-rayimages,” arXiv preprint arXiv:2003.11055, 2020.

[9] I. Apostolopoulos, S. Aznaouridis, and M. Tzani, “Extracting possiblyrepresentative COVID-19 biomarkers from X-ray images with deeplearning approach and image data related to pulmonary diseases,” arXivpreprint arXiv:2004.00338, 2020.

[10] M. Farooq and A. Hafeez, “COVID-ResNet: a deep learning frame-work for screening of COVID19 from radiographs,” arXiv preprintarXiv:2003.14395, 2020.

[11] P. Afshar, S. Heidarian, F. Naderkhani, A. Oikonomou, K. N. Platan-iotis, and A. Mohammadi, “COVID-CAPS: A capsule network-basedframework for identification of COVID-19 cases from X-ray Images,”arXiv preprint arXiv:2004.02696, 2020.

[12] S. Salehi, A. Abedi, S. Balakrishnan, and A. Gholamrezanezhad, “Coro-navirus disease 2019 (COVID-19): a systematic review of imagingfindings in 919 patients,” American Journal of Roentgenology, pp. 1–7,2020.

[13] S. Jegou, M. Drozdzal, D. Vazquez, A. Romero, and Y. Bengio, “Theone hundred layers tiramisu: Fully convolutional densenets for semanticsegmentation,” in Proceedings of the IEEE conference on computervision and pattern recognition workshops, 2017, pp. 11–19.

[14] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”arXiv preprint arXiv:1412.6980, 2014.

[15] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin,A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation inPyTorch,” in NIPS Autodiff Workshop, 2017.

[16] S. H. Yoon, K. H. Lee, J. Y. Kim, Y. K. Lee, H. Ko, K. H. Kim, C. M.Park, and Y.-H. Kim, “Chest radiographic and CT findings of the 2019novel coronavirus disease (COVID-19): analysis of nine patients treatedin Korea,” Korean Journal of Radiology, vol. 21, no. 4, pp. 494–500,2020.

[17] J. Shiraishi, S. Katsuragawa, J. Ikezoe, T. Matsumoto, T. Kobayashi, K.-i. Komatsu, M. Matsui, H. Fujita, Y. Kodera, and K. Doi, “Developmentof a digital image database for chest radiographs with and without a lungnodule,” American Journal of Roentgenology, vol. 174, no. 1, pp. 71–74,Jan 2000. [Online]. Available: https://doi.org/10.2214/ajr.174.1.1740071

[18] B. Van Ginneken, M. B. Stegmann, and M. Loog, “Segmentation ofanatomical structures in chest radiographs using supervised methods:a comparative study on a public database,” Medical image analysis,vol. 10, no. 1, pp. 19–40, 2006.

[19] S. Jaeger, S. Candemir, S. Antani, Y.-X. J. Wang, P.-X. Lu, andG. Thoma, “Two public chest X-ray datasets for computer-aided screen-ing of pulmonary diseases,” Quantitative imaging in medicine andsurgery, vol. 4, no. 6, p. 475, 2014.

[20] Praveen. CoronaHack: Chest X-Ray-Dataset. (2020, Mar21). [Online]. Available: https://www.kaggle.com/praveengovi/coronahack-chest-xraydataset

[21] J. P. Cohen, P. Morrison, and L. Dao, “COVID-19 image datacollection,” arXiv 2003.11597, 2020. [Online]. Available: https://github.com/ieee8023/covid-chestxray-dataset

[22] D. S. Kermany, M. Goldbaum, W. Cai, C. C. Valentim, H. Liang, S. L.Baxter, A. McKeown, G. Yang, X. Wu, F. Yan et al., “Identifyingmedical diagnoses and treatable diseases by image-based deep learning,”Cell, vol. 172, no. 5, pp. 1122–1131, 2018.

[23] C. S. Danzer, “The cardiothoracic ratio: an index of cardiac enlarge-ment.” The American Journal of the Medical Sciences (1827-1924), vol.157, no. 4, p. 513, 1919.

[24] K. Dimopoulos, G. Giannakoulas, I. Bendayan, E. Liodakis, R. Petraco,G.-P. Diller, M. F. Piepoli, L. Swan, M. Mullen, N. Best et al.,“Cardiothoracic ratio from postero-anterior chest radiographs: a simple,reproducible and independent marker of disease severity and outcomein adults with congenital heart disease,” International journal of cardi-ology, vol. 166, no. 2, pp. 453–457, 2013.

[25] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, andD. Batra, “Grad-cam: Visual explanations from deep networks viagradient-based localization,” in Proceedings of the IEEE internationalconference on computer vision, 2017, pp. 618–626.

[26] S. Candemir and S. Antani, “A review on lung boundary detection inchest X-rays,” International journal of computer assisted radiology andsurgery, vol. 14, no. 4, pp. 563–576, 2019.

[27] N. S. Paul, H. Roberts, J. Butany, T. Chung, W. Gold, S. Mehta,E. Konen, A. Rao, Y. Provost, H. H. Hong et al., “Radiologic pattern ofdisease in patients with severe acute respiratory syndrome: the torontoexperience,” Radiographics, vol. 24, no. 2, pp. 553–563, 2004.

[28] I. Thevarajan, T. H. Nguyen, M. Koutsakos, J. Druce, L. Caly, C. E.van de Sandt, X. Jia, S. Nicholson, M. Catton, B. Cowie et al., “Breadthof concomitant immune responses prior to patient recovery: a case reportof non-severe COVID-19,” Nature Medicine, pp. 1–3, 2020.