a systematic collection of medical image datasets for deep

40
Noname manuscript No. (will be inserted by the editor) A Systematic Collection of Medical Image Datasets for Deep Learning Johann Li 1 · Guangming Zhu 1 · Cong Hua 1 · Mingtao Feng 1 · Basheer Bennamoun 2 · Ping Li 3 · Xiaoyuan Lu 3 · Juan Song 1 · Peiyi Shen 1 · Xu Xu 4 · Lin Mei 4 · Liang Zhang 1 · Syed Afaq Ali Shah 5 · Mohammed Bennamoun 6 Received: date / Accepted: date Abstract The astounding success made by artificial intelligence (AI) in healthcare and other fields proves that AI can achieve human-like performance. However, success always comes with challenges. Deep learning al- gorithms are data-dependent and require large datasets for training. The lack of data in the medical imaging field creates a bottleneck for the application of deep learning to medical image analysis. Medical image ac- quisition, annotation, and analysis are costly, and their usage is constrained by ethical restrictions. They also require many resources, such as human expertise and funding. That makes it difficult for non-medical re- searchers to have access to useful and large medical data. Thus, as comprehensive as possible, this paper provides a collection of medical image datasets with their associated challenges for deep learning research. We have collected information of around three hundred datasets and challenges mainly reported between 2013 and 2020 and categorized them into four categories: head & neck, chest & abdomen, pathology & blood, BCorresponding author: Guangming Zhu, Liang Zhang, and Syed Afaq Ali Shah E-mail: [email protected] E-mail: [email protected] E-mail: [email protected] 1 School of Computer Science and Technology, Xidian University, China 2 School of Medicine, The University of Notre Dame, Aus- tralia 3 Data and Virtual Research Room, Shanghai Broadband Network Center, China 4 The Third Research Institute of The Ministry of Public Security, China 5 Discipline of Information Technology, Media and Commu- nications, Murdoch University, Australia 6 Department of Computer Science and Software Engineer- ing, The University of Western Australia, Australia and “others”. Our paper has three purposes: 1) to pro- vide a most up to date and complete list that can be used as a universal reference to easily find the datasets for clinical image analysis, 2) to guide researchers on the methodology to test and evaluate their methods’ performance and robustness on relevant datasets, 3) to provide a “route” to relevant algorithms for the relevant medical topics, and challenge leaderboards. Keywords Medical image analysis · Deep learning · Datasets · Challenges · Computer-aided diagnosis 1 Introduction Since the invention of medical imaging technology, the field of medicine had entered a new era. The begin- ning of medical imaging started with the adoption of X-Rays. With further technical advancements, many other imaging methods, including 3D computed tomog- raphy (CT), magnetic resonance imaging (MRI), nu- clear medicine, ultrasound, endoscopy, and optical co- herence tomography (OCT), were also exploited. Di- rectly or indirectly, these imaging modalities have con- tributed to the diagnosis and treatment of various dis- eases, and the research related to the human body’s structure and intrinsic mechanisms. Medical images can provide critical insight into the diagnosis and treatment of many diseases. The human body’s different reactions to imaging modalities are used to produce scans of the body. Reflection and trans- mission are commonly used in medical imaging because the reflection or transmission ratio of different body tis- sues and substances are different. Some other methods acquire images by changing the energy transferred to the body, e. g., magnetic field changes or the rays radi- ated from a chemical agent. arXiv:2106.12864v1 [eess.IV] 24 Jun 2021

Upload: others

Post on 02-Oct-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Systematic Collection of Medical Image Datasets for Deep

Noname manuscript No.(will be inserted by the editor)

A Systematic Collection of Medical Image Datasets for DeepLearning

Johann Li 1 · Guangming Zhu 1 · Cong Hua 1 · Mingtao Feng 1 ·Basheer Bennamoun 2 · Ping Li 3 · Xiaoyuan Lu 3 · Juan Song 1 · PeiyiShen 1 · Xu Xu 4 · Lin Mei 4 · Liang Zhang 1 · Syed Afaq Ali Shah 5 ·Mohammed Bennamoun 6

Received: date / Accepted: date

Abstract The astounding success made by artificialintelligence (AI) in healthcare and other fields provesthat AI can achieve human-like performance. However,success always comes with challenges. Deep learning al-gorithms are data-dependent and require large datasetsfor training. The lack of data in the medical imagingfield creates a bottleneck for the application of deeplearning to medical image analysis. Medical image ac-quisition, annotation, and analysis are costly, and theirusage is constrained by ethical restrictions. They alsorequire many resources, such as human expertise andfunding. That makes it difficult for non-medical re-searchers to have access to useful and large medicaldata. Thus, as comprehensive as possible, this paperprovides a collection of medical image datasets withtheir associated challenges for deep learning research.We have collected information of around three hundreddatasets and challenges mainly reported between 2013and 2020 and categorized them into four categories:head & neck, chest & abdomen, pathology & blood,

BCorresponding author: Guangming Zhu, Liang Zhang, andSyed Afaq Ali ShahE-mail: [email protected]: [email protected]: [email protected]

1 School of Computer Science and Technology, XidianUniversity, China2 School of Medicine, The University of Notre Dame, Aus-tralia3 Data and Virtual Research Room, Shanghai BroadbandNetwork Center, China4 The Third Research Institute of The Ministry of PublicSecurity, China5 Discipline of Information Technology, Media and Commu-nications, Murdoch University, Australia6 Department of Computer Science and Software Engineer-ing, The University of Western Australia, Australia

and “others”. Our paper has three purposes: 1) to pro-vide a most up to date and complete list that can beused as a universal reference to easily find the datasetsfor clinical image analysis, 2) to guide researchers onthe methodology to test and evaluate their methods’performance and robustness on relevant datasets, 3) toprovide a “route” to relevant algorithms for the relevantmedical topics, and challenge leaderboards.

Keywords Medical image analysis · Deep learning ·Datasets · Challenges · Computer-aided diagnosis

1 Introduction

Since the invention of medical imaging technology, thefield of medicine had entered a new era. The begin-ning of medical imaging started with the adoption ofX-Rays. With further technical advancements, manyother imaging methods, including 3D computed tomog-raphy (CT), magnetic resonance imaging (MRI), nu-clear medicine, ultrasound, endoscopy, and optical co-herence tomography (OCT), were also exploited. Di-rectly or indirectly, these imaging modalities have con-tributed to the diagnosis and treatment of various dis-eases, and the research related to the human body’sstructure and intrinsic mechanisms.

Medical images can provide critical insight into thediagnosis and treatment of many diseases. The humanbody’s different reactions to imaging modalities areused to produce scans of the body. Reflection and trans-mission are commonly used in medical imaging becausethe reflection or transmission ratio of different body tis-sues and substances are different. Some other methodsacquire images by changing the energy transferred tothe body, e. g., magnetic field changes or the rays radi-ated from a chemical agent.

arX

iv:2

106.

1286

4v1

[ee

ss.I

V]

24

Jun

2021

Page 2: A Systematic Collection of Medical Image Datasets for Deep

2 Johann Li 1 et al.

Before modern AI was applied in medical imageanalysis, radiologists and pathologists needed to man-ually look for the critical “biomarkers” in the patient’sscans. These “biomarkers”, such as tumors and nod-ules, are the basis for the medics to diagnose and de-vise treatment plans. Such a diagnostic process needs tobe performed by medics with extensive medical knowl-edge and clinical experience. However, problems suchas diagnostic bias and the lack of medical resourcesare prevalent and cannot be avoided. After the recentbreakthroughs in AI (which achieve human-like perfor-mance, e. g., for image recognition [1,2,3], and can wingames such as Go [4] and real-time strategy games [5]),the development of AI-based automatic medical imageanalysis algorithms has attracted lots of attention. Re-cently, the application of AI in medical image analysishas become one of the major research focuses and hasattained many significant achievements [6, 7, 8].

Many researchers brought their focus to AI-basedmedical image analysis methods thinking that it mightbe one of the solutions to the challenges (e.g., medi-cal resource scarcity) and taking advantage of the tech-nological progress [9, 10, 11, 12, 13]. Traditional medi-cal image analysis focuses on detecting and identifyingbiomarkers for diagnosis and treatment. AI imitates themedic’s diagnosis through classification, segmentation,detection, regression, and other AI tasks in an auto-mated or semi-automated way.

AI has achieved a significant performance for manycomputer vision tasks. This success is yet to be trans-lated to the medical image analysis domain. Deep learn-ing (DL), a branch of AI, is a data-dependent methodas it needs massive training data. However, when DLis applied to medical image analysis, the paucity of la-beled data becomes a major challenge and a bottleneck.

Data scarcity is a common problem when applyingDL methods to a specific domain, and this problem be-comes more severe in the case of medical image anal-ysis. Researchers, who apply DL methods to medicalimage analysis research, do not usually have a medicalbackground, commonly computer scientists. They can-not collect data independently because of the lack of ac-cess to medical equipment and patients, and they can-not annotate the acquired data either because they lackthe relevant medical knowledge. Furthermore, medicaldata is owned by institutions who cannot easily makeit public due to privacy and ethics restrictions. Whenresearchers evaluate their algorithms on their privatedata, the results of their research become incompara-ble.

To address some of these problems, MICCAI, ISBI,AAPM, and other conferences and institutions havelaunched many DL-related medical image analysis chal-

lenges. These aim to design and develop automatic orsemi-automatic algorithms and promote medical imageanalysis research with computer-aided methods. Con-currently, some researchers and institutions also orga-nize projects to collect medical datasets and publishthem for research purposes.

Despite all these developments, it is still challeng-ing for novice medical image analysis researchers tofind medical data. This paper addresses this challengeand presents a comprehensive survey of existing medi-cal datasets. The paper also identifies and summarizesmedical image analysis challenges. It also provides apathway to identify the most relevant datasets for eval-uation and the suitable methods they need in the re-spective challenge leader board.

This paper refers to other research papers witha number between square brackets and refers to thedatasets listed in the tables with numbers betweenparentheses.

The following sections present the details of thekey datasets and challenges. Section 2 summarizesthe datasets and challenges, including the years, bodyparts, tasks, and other information that is relevantto the dataset development. Section 3 discusses thedatasets and challenges of the head and neck. Section4 covers the datasets and challenges related to the chestand abdomen organs. Section 5 examines the datasetsand challenges of pathology and blood related tasks.Section 6 introduces other datasets and challenges re-lated to bone, skin, phantom, and animals. We have alsocreated a website with a git repo1, which shows the listof these datasets and their respective challenges.

2 Medical image datasets

In this section, we provide an overview of the imagedatasets and challenges. Our collection contains overthree hundred medical image datasets and challengesorganized between 2004 and 2020. This paper focusesmainly on the ones between 2013 and 2020. Subsec-tions 2.1, 2.2, 2.3, and 2.4 provide information aboutthe year, body parts,modalities, and tasks, respec-tively. In Subsection 2.5, we introduce the sourcesfrom where we have collected these datasets and thechallenges. Details about the categorization of theseimage datasets and challenges into four groups are pro-vided in the subsequent sections. We provide a taxon-omy of our paper in Figure 1 to help the reader navigatethrough the different sections.

1 The website with the git repo will be public after the paperis accepted.

Page 3: A Systematic Collection of Medical Image Datasets for Deep

A Systematic Collection of Medical Image Datasets for Deep Learning 3

Medical ImageDatasets

Head & NeckSec. 3

Chest & Abdomen

Sec. 4

Pathology & Blood

Sec. 5

OtherSubjects

Sec. 6

Basic TasksSec. 3.1

Brain DiseaseSec. 3.2

EyeSec. 3.3

Other SubjectsSec. 3.4

Segmentation of White and Gray

Matter

Segmentation of Other Tissues & Functional Areas

Other tasks

Segmentation of Tumors and Lesions

Classification of Disease

Alzheimer'sdisease

Other diseaseNeckCephalometric

and toothBehavior and

cognition

OrganSegmentation

Sec 4.1

Diagnosis of Diseases

Sec 4.2

Other TasksSec 4.3

Organ Contour Segmentation

Organ Parts Segmentation

Modality

DetectionSegmentation

COVID-19

Regression TrackingRegistration Others

PathologySec 5.1

BloodSec 5.2

Imaging

Stain

Diseases

Tasks

MicrocosmicWSI-level

Data

Cell detection & segmentation

Patchlevel classification

Other tasks

Classification

Segmentation and detection

Other tasks

BoneSec 6.1

SkinSec 6.2

PhantomSec 6.3

AnimalSec 6.4

Classification

Segmentation

Others

Modality

Focus

Metric

Tissuessegmentation

Functional areas

Generation

Registration

Tractography

Glioma

Ischemicstroke lesion

Multiplesclerosis lesion

Intracranial hemorrhage

ModalityTaskFocused diseases

Organs

ModalityFocusSingle-organ contour segmentation

Multi-organ contour segmentation

Heart Others

Classification

Fig. 1: An overall taxonomy to outline the organization of the paper.

2.1 Years

The timeline of these medical image datasets canbe split into two, starting from 2013 as the water-shed, since Krizhevsky et al.’s excellent success in theILSVRC competition with their AlexNet [14] in 2012.The continuous advancement of deep learning has, tosome extent, driven more and more researchers to fo-cus on medical image analysis and indirectly led to anincrease in the number of datasets and competitionseach year. The number of datasets and challenges be-fore 2013 are irregular according to our statistics. Themain reason is that many datasets developed before2012 are not aimed at computer-aided diagnosis, forexample, ADNI-1 (52), although those data could beused for DL. Therefore, we only focus on the datasetsand challenges which were released after 2013.

Figure 2A shows the statistics of the datasets andchallenges per year between 2013 and 2020. As shownin Figure 2A, the number of related datasets and chal-lenges increased year by year because of the progressand success of DL in computer vision and medical im-age analysis. That led more and more researchers tofocus on medical image analysis with DL-based meth-ods and more and more datasets and challenges withdifferent body parts and tasks started to appear.

As shown in Figures 2B and 2C, there was not onlyan increase in the number of datasets and challenges butalso in their variety with respect to the body parts andtypes of tasks. The research focus ranges from a sim-ple diagnosis or structural analysis (e. g., segmentationand classification) in the early stages to more complextasks or combinations of tasks that are closer to the clin-ical needs, including classification, segmentation, detec-tion, regression, generation, tracking, and registration,as time progresses. The focus of these datasets and chal-lenges has also changed from cancer diagnosis to the en-tire healthcare system. Meanwhile, the organs focusedon by researchers also range from the single and sim-ple, but important ones, such as the brain and lungs, tomany different other parts of the human body account-ing for different sizes, shapes, and other characteristics.

2.2 Body parts

With the success of DL, the number of focused bodyparts has increased, as shown in Figure 2B. We alsoshow the most targeted researched body parts in Figure2E, and the top-5 researched organs include the brain,lung, heart, eye, and liver. These organs have been thefocus of research because they are the most importantparts of the human body.

Page 4: A Systematic Collection of Medical Image Datasets for Deep

4 Johann Li 1 et al.

7 1224 25 28

4559 65

2013 2014 2015 2016 2017 2018 2019 2020

Segmentation

40%

Classification

31%

Detection 14%

Generation 5%

Registration4%

Regression 4%

Tracking 2%

MRI 35%

CT 24%Pathology image 11%

CR 8%

Other 8%

Ultrasound 4%

Fundus photo 3%

Endoscopy 3%

PET 3%

OCT 1%Other 29%

Brain 24%Lung 15%

Heart 8%

Eye 5%

Bone 5%

Liver 4%

Neck 4%

Prostate 2%

Kidney 2%

Breast 2%

2013 2014 2015 2016 2017 2018 2019 2020

Brain Heart Lung Liver Eye

Segmentation Classification Detection Other MRI CT/CR/PET Path. image Other

A B

C D

E F G

79

27

50

13

18

136

108

49

50

126

124

39

70

Fig. 2: Summary of medical image datasets and challenges from 2013 to 2020. Figure 2A shows the number ofdatasets and challenges published in each year. Figures 2 B, C, and D show the year-by-year trends along with thetrends in relative numbers for each of the different categories by year. The numbers listed right are the summarycount of each category, and the summary counts are not the same with the total numbers, because of 1) some ofthe categories are not shown, and 2) a dataset counts two times if it includes two of the categories. Figures 2 E,F, and G show the most predominant body parts, data modalities, and main tasks with the percentage of theirrespective dataset.

In the beginning, the main reason which motivatedresearchers to focus on these organs and parts was thata simple diagnosis and a structural study greatly helpedin the diagnosis and treatment of cancer (a major threatto human life). Many datasets focus on brain, lungand other organs, without considering DL, and manychallenges focus on simple tasks, such as segmentationand classification. Subsequently, AI showed to be morecompetent to tackle complex tasks, and therefore re-searchers started to focus on several other organs. Forexample, eye related diseases, which cause blindness,incited the collection of eye related datasets and the re-lease of challenges. Some other datasets and challengesfocus on the small organs, such as the prostate, whichare challenging to analyze due to the low resolution ofimages.

2.3 Modalities

There are several types of medical image modalities. Asshown in Figure 2F, the frequently used modalities toacquire medical datasets include Magnetic ResonanceImages (MRI), Computed Tomography (CT), Ultra-sound (US), Endoscopy, Positron Emission Tomogra-phy (PET), Computed Radiography (CR), Electrocar-diography, and Optical Coherence Tomography (OCT).We introduce below these main modalities and providea summary at the end of this subsection.

Radiography: Radiography is an imaging techniquebased on the difference of attenuation when X-rayspasses through the different organs and tissues of thehuman body. The primary used modalities include CRand CT. CR is a 2D image, and CT is a volume (3D) im-age. Radiography is the most commonly used method toimage the human body. For example, CR is frequentlyused to diagnose chest related diseases, such as pneumo-nia, tuberculosis, and COVID-19. Meanwhile, 3D CT

Page 5: A Systematic Collection of Medical Image Datasets for Deep

A Systematic Collection of Medical Image Datasets for Deep Learning 5

plays an important role in the diagnosis and treatmentrelated to cancer and lesions. The advantages of radio-graphy are 1) high resolution of the hard tissues (e. g.bones), 2) lower cost, and 3) compatibility of contrastagents, but the disadvantages are 1) X-rays are harmfulfor human health, 2) X-rays are ideal for distinguishingbetween healthy tissues and tumors without the help ofcontrast agents, and 3) their resolution is limited by theradiation intensity. Moreover, as the main componentof the human bone is calcium, CT plays an importantrole in many bone related diagnoses.

Magnetic resonance: MR images display the bodystructure caused by the difference of signal releasedby the different substances of the imaged organ as themagnetic field is changed. MR has many submodalities,such as T1 and T2. For essential organs and tissues,MR is a commonly used imaging method because it isconsidered non-invasive, effective, and safe. Due to theprinciple of MR imaging, MR plays an essential rolein the diagnosis of brain, heart, and soft tissues. Be-cause higher resolution MR images can be obtained byincreasing the magnetic field strength, MR is also suit-able for small organs or tissues. However, MR imagesdo have disadvantages such as high cost and incompat-ibility with metal (e. g., metallic orthopedic implants).

Nuclear medicine: Nuclear medicine captures imagesby the absorption of the targeted tissue of specific chem-ical components marked by radioactive isotopes. Tu-mors and healthy tissues absorb different chemical com-ponents, so medics use the specific chemical markedwith the radioactive isotope and receive the ray radi-ated by the chemical. An example is Positron Emis-sion Computed Tomography, i. e., PET, which per-forms imaging by capturing radiations produced by flu-orodeoxyglucose or other similar contrast agents ab-sorbed by the tissue or tumor. Nuclear medicine is goodat imaging regions of interest, such as tumors, but thedisadvantage is their high cost and the low-resolution.

Ultrasound: Ultrasound operates by acquiring the dif-ferences in the absorption and reflection of ultrasoundwaves when applied to tissues. It is widely used in imag-ing the heart and fetus because ultrasound causes nodamage to these parts and provides real-time imaging.Nevertheless, the main disadvantage is the noise causedby the reflection of irregular shapes of organs and tis-sues, and the interference with their imaging.

Eye-related modalities: An OCT image is obtainedby using low-coherence light to capture 2D and 3D

micrometer-resolution images within optical scatter-ing media to diagnose eye-related diseases. The fundusphoto is also used for diagnosis purposes. These twomethods are non-invasive eye-specific imaging modali-ties.

Pathology: Pathological data is the gold standard indiagnosing diseases. It is taken with microscopy of thestained tissue slides by the camera to show cell-levelfeatures. Pathology is used in the cell-level diagnosisfor cancer and tumors.

Other modalities: Other imaging modalities are usualbut specific to certain body parts, such as endoscopy,and provide the medics with various biomarkers tomake critical decisions when diagnosing, curing, andresearching.

Overall, MR, CT, and other modalities are the mostcommonly used imaging modalities. MR can providesharp images without harmful radiations of soft tissues.It is therefore widely used in the imaging of brain, heartand many other small organs. CT is an economical andsimple imaging approach, and it is widely used for thediagnosis of cancer, e. g., the neck, chest, and abdomen.A pathology image is different from MR and CT, be-cause it is a cell-level imaging method. Pathology iswidely used in cancer-related diagnosis.

2.4 Tasks

According to our analysis, our collected datasets andchallenges have been used for the tasks of classification,prediction, detection, segmentation, location, charac-terization, up-sampling, tracking, registration, regres-sion, estimation, coding, automatic annotation, andother tasks. As Figure 2G shows, we grouped thesetasks into seven categories: classification, segmenta-tion, detection, regression, generation, registration, andtracking. The following subsections briefly describe eachtask.

Classification: Classification is used for qualitativeanalysis. According to pre-defined specific rules, theclassification task aims to group medical images or par-ticular regions of an image into two or more distinctcategories. The classification task can be used alone formedical image analysis or as a subsequent task afterother lower level tasks, such as segmentation and de-tection, in order to analyze the results and further ex-tract features. There are many ways to express the clas-sification task, such as detection and prediction. The

Page 6: A Systematic Collection of Medical Image Datasets for Deep

6 Johann Li 1 et al.

detection tasks (which are also sometimes termed asclassification) are different from the ones introduced inthe following paragraph, although sometimes the sameword is used synonymously. The typical examples ofclassification tasks include AD prediction and the at-tributes classification of pulmonary nodules. AD pre-diction aims to group MR images in Alzheimer’s dis-ease (AD) and normal cognition (NC). The attributesclassification of pulmonary nodules aims to analyzethe pathology attributes of pulmonary nodules. Clas-sification performance measures mainly include accu-racy, precision, specificity, sensitivity, F-score, ROC,and AUC. All these measures are based on four ba-sic measures: true positive (TP), false positive (FP),true negative (TN), and false negative (FP).

Segmentation: The segmentation task can be regardedas a pixel-level or voxel-level classification task, but thedifference is that the segmentation task is limited tothe context. It aims to split an image into different ar-eas or contour specific regions. The regions can containtumor, tissue, or other specific targets. The results ofthe segmentation task consist of areas and boundaries.Since segmentation can be seen as a pixel-level classi-fication, the average precision (AP) can be used as ametric. Other performance metrics include intersectionover union (IoU), Dice index, Jaccard Index, Hausdorffdistance, and average surface distance.

Detection: The detection task aims to find an object ofinterest, and it also usually needs to classify such anobject (classification task). In this work, we categorizethe tasks which aim to determine the location of theobject of interest with a bounding box or a point. Thedetection task is sometimes represented as a localiza-tion task. A typical example of detection is pulmonarynodules detection, which aims to find the pulmonarynodules in chest CT images and annotate the noduleswith a bounding box. The performance measures usedin the detection tasks include mainly the intersectionover union (IoU), mean Average Precision (mAP), pre-cision and recall, false positive rate, receiver operatingcharacteristic curve (ROC), and other metrics. For thetask to locate an object without the boundary, the Eu-clidean Distance is the most commonly used measure.

Regression: Classification is used for qualitative anal-ysis, while regression is used for quantitative analysis.A typical example is the estimation of the volume ofa lesion. For the regression task, the root mean squareerror, i. e., rMSE, mean absolute error, and correlationcoefficient are the most commonly used metrics.

Tracking: The tracking task aims to locate specific tar-gets, but the tracking is a dynamic process and is there-fore different from the detection task. That means thetracking algorithms need to detect or localize targets indifferent frames. For medical image analysis, the track-ing tasks include the tracking of organs and tissues. Thetracking is not just of one point, but it can also be of anarea, e. g., every part of an organ or tissue. An exampleis the tracking of the lung when the subject breathes.

Generation: The image data generation task has manydifferent aims, but for simplicity we categorize all ofthese aims under the “generation task” because theyfocus on generating image data from other image data.Typical generation tasks include 1) to generate a T2-weight image from T1-weight images and 2) to generatea pathology image stained with one stain from an imagestained by another stain.

Registration: The image registration task aims to alignone image with another image, i. e., to find a trans-formation (e. g., rotation and translation) to alignthe two images. Registration is a necessary processfor computer-aided diagnosis algorithms from multi-modalities. During medical scanning, the movement ofthe human body cannot be avoided and is a challenge.At the same time, imaging cannot be taken instantly.As a result, images from different viewpoints cannotbe aligned directly or when two or more modalities areused. Therefore, researchers rely on registration tech-niques to solve these alignment problems.

2.5 Source and Term

We collected the datasets and challenges mainly fromThe Cancer Imaging Archive [15], Grand Challenge,Kaggle, OpenNeuro, PhysioNet [16], and Codalab.

The original records of datasets and challenges thatwe collected include four to five hundred, and we re-moved some of them as some datasets are not suitablefor DL and AI methods. We then categorized the re-maining datasets and challenges into different groups.Categorizing the datasets and challenges is not easy be-cause all these datasets and challenges are derived fromclinical research sources. Thus we used an asymmetriccategorization to group these datasets and challengesinto four groups, as shown in Figure 1. This means thatwe did not use the same sub-taxonomy in each categoryor sub-category.

First, we split the medical datasets and challengesinto two groups: body-level and cell-level (Section

Page 7: A Systematic Collection of Medical Image Datasets for Deep

A Systematic Collection of Medical Image Datasets for Deep Learning 7

5), according to the imaged body part. The body-level datasets focus on specific tissues, while the cell-level ones focus on cells. Second, we grouped thedatasets and challenges of the brain, eye, and neckinto one group (Section 3), because these are partsof the head. Third, we organized the datasets andchallenges related to the chest and abdomen into thesame group (Section 4). These datasets and challengesrelate to the diagnosis, anatomical segmentation, andtreatments. Finally, for the datasets and challengesthat cannot be categorized into the above groups, wegrouped them under “other” (Section 6), and thesedatasets and challenges are related to the skin, bone,phantom, and animals.

The introduction of each group and sub-group in-cludes mainly the type of modality, the task, the dis-ease, and the body part. However, not all the groupsof datasets can be introduced in that way. For somegroups, we introduce the datasets and challenges ac-cording to the domain-specific problems. For example,we categorize the pathology datasets into microcosmicand macrocosmic tasks.

3 Head and neck related datasets andchallenges

The head and neck are significant parts of the humanbody because many essential organs, glands, and tissuesare located there. Several researchers’ image analysiswork relate to the head and neck. To make an effectiveuse of computers for research, diagnosis, and treatment,many researchers have released datasets and challenges,for examples: 1) the analysis of tissue structureand functions (2, 3, 4, 6) and 2) diseases diagnosis(30, 39, 47).

Because the brain controls emotions, actions andfunctions of other organs, the brain’s area is significant.First, we introduce the datasets and challenges relatedto the analysis of the brain structure, function, imaging,and other basic tasks in Subsection 3.1. Second, weintroduce the datasets and challenges related to braindisease diagnosis in Subsection 3.2.

Moreover, since the eyes are crucial to our vision, thecomputer-aided diagnosis of eye-related diseases is alsoan important research focus. The eye-related datasetsand challenges are covered in Subsection 3.3. We intro-duce other datasets and challenges of the neck and thedatasets related to the brain’s behavior and cognitionin Subsection 3.4.

3.1 Structural analysis tasks of the brain

The basic analysis and processing of the brain med-ical images are clinically critical for the diagnosis,treatment, and other brain-related analysis tasks. Thedatasets and challenges we discuss are mainly for thesegmentation tasks and center around the brain struc-ture. In contrast, some datasets focus on imaging, in-cluding MR imaging acceleration, the non-linear reg-istration of different resolutions, and tissue reconstruc-tion. One of the most popular tasks is the segmentationof white matter (WM), gray matter (G<), and cere-brospinal fluid (CSF), and their respective datasets andchallenges are introduced in Subsection 3.1.1. Mean-while, other tissues and functional areas’ segmentationare also the focus of research, and their related datasetsand challenges are discussed in Subsection 3.1.2. Sub-section 3.1.3 describes the other basic tasks. Table 1shows the datasets and challenges of these basic tasks.

3.1.1 Segmentation of white and gray matter

The segmentation of WM, GM, and CSF has greatsignificance for brain structure research and computer-aided diagnosis, particularly using AI. Similarly, for AIalgorithms, it is also of great significance to understandthe human brain’s structure. Therefore, MICCAI andothers have held many challenges with this research fo-cus, and researchers could design automatic algorithmsto segment magnetic resonance images into differentparts. We introduce these datasets and challenges withrespect to their modalities and tasks.

Modality: The datasets and challenges which focus onthe WM, GM, and CSF segmentation, usually provideMR images. Challenges (2, 3, 4, 5, 6, 7) provide mainlytwo modalities: T1, T2, while datasets (1, 8) only pro-vide T1 for the white matter hyperintensities segmen-tation task. Note that, MR scans are sensitive to thehydrogen atom, and such a feature can effectively helpimage analysts to distinguish between different tissuesand parts of the image. Moreover, due to the color ofthe tissue imaged by MR, these scans are named as“white matter” and “gray matter”.

Task: The main focus of these datasets and challengesis the segmentation of WM, GM, and CSF. However,they do not only focus on that. Challenges (1, 4, 5, 6, 7)also provide the annotation of other parts of the brain,including basal ganglia, white matter lesions, cerebel-lum, and infarction. One of the challenge for segmen-tation is the presence of a lesion because of the unnat-ural characterization of lesions. A well-annotated data

Page 8: A Systematic Collection of Medical Image Datasets for Deep

8 Johann Li 1 et al.

Table1:

Summaryof

datasets

andchalleng

esfortheba

sicbrainim

agean

alysis.

Ref

eren

ceD

atas

et/C

halle

nge

Yea

rM

odal

itie

sFo

cus

Tas

ksIn

dex

T1

T2

dMR

OT

Segm

entatio

nGen

eration

Registration

1CEREBRuM

[17]

2019

XW

M,G

M,C

SFX

2iSeg

2019

[18]

2019

XX

WM,G

M,C

SFX

3iSeg

2017

[19]

2017

XX

WM,G

M,C

SFX

4MRBrainS18

2018

X1

X2

WM,G

M,C

SFX

5NEAT

BrainS15

[20]

2015

X1

X2

WM,G

M,C

SFX

6MRBrainS13

[21]

2013

X1

X2

WM,G

M,C

SFX

7Neona

talM

RBrainS12

[22]

2012

XX

WM,G

M,C

SFX

8W

MH

Seg.

Chlg.

[23]

2017

XW

M,G

M,C

SFX

9ENIG

MA

Cereb

ellum

2017

XCereb

ellum

X10

Mindb

oggle[24]

2012

XBrain

atlases

X11

Labe

ledBrain

Scan

s2012

XBrain

atlases

X12

CAUSE

07[25]

2007

XCau

date

X

13AutoImplan

t[26]

2020

X3

Craniop

lasty

X14

AccelMR

2020

[27]

2020

XX

Non

-line

armap

ping

ofdiffe

rent

resolutio

nsX

15MRIW

MRecon

struction[28]

2020

XW

hite

matterreconstructio

nX

16Calgary

Cam

pina

sBrain

Dataset

[29]

2020

XBrain

imagereconstructio

nX

17MRIRecon

structionCha

lleng

e2020

XMR

reconstructio

nX

18Fa

stMRI[30]

2018

XX

Acceleratingmagne

ticresona

nceim

aging

X19

MUSH

AC

2018

2018

XDW

MRIregistratio

nan

den

hancem

ent

XX

20HARDI2013

2013

XDiffusionMRIreconstructio

nX

21HARDI2012

2012

XDiffusionMRIreconstructio

nX

22CuR

IOUS2019

2019

XX4

Imageregistratio

nX

23CuR

IOUS2018

2018

XX4

Imageregistratio

nX

243D

VoTEM

2018

XFibe

rtractograp

hyX

25DTITr

actograp

hyCha

lleng

e2012

2012

XDTItractograp

hyX

26DTITr

actograp

hyCha

lleng

e2011

2011

XDTItractograp

hyX

1T1w

andT1w

-IR

2T2-FL

AIR

3CT

4Ultr

asou

nd

Page 9: A Systematic Collection of Medical Image Datasets for Deep

A Systematic Collection of Medical Image Datasets for Deep Learning 9

can help AI to overcome this problem and also achievemore robust results. Challenges (5, 7) use MR imagesof the neonatal brain, and consider tissue volumes asan indicator of long-term neurodevelopmental perfor-mance [22].

Performance metric: For the segmentation task, theDice score is one of the most commonly used metrics,and all these datasets and challenges adopt it as a per-formance measure. Besides the Dice score, datasets (4,6, 8) also use Hausdorff distance and volumetric sim-ilarity as metrics; datasets (2, 3) use the average theHausdorff distance and the average surface distance asone of their metrics; moreover, dataset (8) also uses sen-sitivity and F1-score as metrics for performance evalu-ation.

3.1.2 Segmentation of functional areas & other tissues

The segmentation of functional areas and tissues hasalso an essential meaning for brain-related research andcomputer-aided diagnosis. In this subsection, we intro-duce the datasets and challenges that are related to thesegmentation of functional areas and tissues.

Tissues segmentation: While, WM, GM, and CSF wereintroduced in Subsection 3.1.1, the segmentation ofother brain tissues is also an active research area. Chal-lenges (1, 4, 6, 7) aim to segment brain images into dif-ferent tissues, including ventricles, cerebellum, brain-stem, and basal ganglia. These challenges provide MRimages and the voxel-level annotations of the regions ofinterest with thirty or forty scans. Because these regionsare essential for brain health, researchers need to over-come the challenges related to their size and shape inorder to segment them. Dataset (9) focuses on the cere-bellum segmentation from the diffusion-weighted image(DWI), while dataset (12) focuses on the segmentationof caudate from the brain MR image.

Functional areas: The segmentation of the humanbrain cortex into different functional areas is of greatsignificance in education, clinical research, treatment,and other applications. Datasets (10, 11) provide im-ages and annotations for the design of automatic al-gorithms to segment the brain cortex into differentfunctional areas. Dataset (10) uses DTK protocol [24],which is modified from DK protocol [31], and theDTK protocol includes 31 labels, details of whichare listed in https://mindboggle.readthedocs.io/en/latest/labels.html. Dataset (11) is a commercialdataset for research in the segmentation of functionalareas of the brain cortex.

3.1.3 Imaging-related tasks

In addition to the segmentation tasks of the brain tis-sues and the functional areas, some of the datasets andchallenges also focus on the generation, registration,and tractography.

Generation: Datasets and challenges (14, 17, 18) aimto accelerate MR imaging or generate high-resolutionMR images from low-resolution ones. Usually, high-resolution imaging requires higher cost, while low-resolution imaging is cheaper but affects the analyticaljudgment and may lead to an incorrect diagnosis. Thesechallenges provide many scans at low-resolution to al-low researchers to design algorithms to convert or maplow-resolution images onto higher-resolution ones. Thedatasets and challenges mainly focus on the generationtasks. Another focus is the cranioplasty (13) to gener-ate a part of broken skull from CT images of the modelsof the broken skull. Other datasets and challenges (15,19, 20, 21) focus on the reconstruction of MR images.

Registration: The registration between different modal-ities is another research focus. Challenges (22, 23) fo-cus on the registration between ultrasound data andMR images of the brain. Cross-modality registrationis difficult because the subject is not absolutely static.Moreover, the MR is a 3D volume imaging modality andhence is different from ultrasound, which is a 2D imag-ing modality. Thus, these challenges focus on establish-ing the topological relation between Preoperative MRimage and intraoperative ultrasound. Challenge (19)also focuses on the diffusion MR image registration toeliminate differences between different vendors’ hard-ware devices and protocols.

Tractography: Tractography is another segmentationtask and focuses on the segmentation and imaging ofthe fiber in the WM. Dataset (24) aims to segmentthe fiber bundles from brain images, including phan-tom, squirrel monkey, and macaque, while challenges(25, 26) focus on the tractography with DTI, anothertype of MR image.

3.2 Brain diseases related datasets and challenges

Besides the structural analysis and image processingtasks, computer-aided diagnosis is also a research focusin healthcare. Medical image analysis plays a criticalrole in clinical research, diagnosis, and treatment. Thedatasets and challenges we have included are for twotasks: 1) the segmentation of lesions and tumors and 2)the classification of diseases. For the segmentation task,

Page 10: A Systematic Collection of Medical Image Datasets for Deep

10 Johann Li 1 et al.

the respective datasets and challenges focus on the tu-mor and lesion segmentation of the human brain, markthe lesion’s contour for diagnosis and treatment, andthe relevant details are shown in Subsection 3.2.1. Forclassification tasks, the datasets and challenges havebeen used for the development of automatic algorithmsto classify or predict diseases from medical images, andthese datasets and challenges are presented in Subsec-tion 3.2.2.

3.2.1 Datasets for segmentation of tumors and lesions

Tumors and lesions in the brain affect human’s healthylife and safety, and image analysis is an effective wayto diagnose the relevant diseases. In this subsection, re-lated datasets and challenges are introduced, and theyare reported in Table 2.

Glioma datasets and challenges: Gliomas are one of themost common brain malignancies for adults. Therefore,many challenges and datasets focus on the segmenta-tion of glioma for its diagnosis and treatment. BraTSchallenge series (30, 31, 32, 33, 34, 35, 36, 37, 38) havebeen occurring since 2012 to segment the glioma. Thechallenges of such a segmentation task are caused bythe heterogeneous appearance and shape of gliomas.The heterogeneity of glioma reflects its shape, modal-ities, and many different histological sub-regions, suchas the peritumoral edema, the necrotic core, enhancing,and the non-enhancing tumor core. Therefore, these se-ries of challenges provide multi-modal MR scans to helpresearchers design and train algorithms to segment tu-mors and their sub-regions. The tasks of this challengeseries include 1) low- and high-grade glioma segmenta-tion (37, 38), 2) survival prediction from pre-operativeimages (32, 33), and 3) the quantification of segmenta-tion uncertainty (30, 31). Besides the BraTS challengeseries, dataset (47) is another one for the segmenta-tion of low-grade glioma and provides T1-weight andT2-weight MR images with biopsy-proven gene statusof each subject by fluorescence in-situ hybridization,a. k. a. FISH [46]. Dataset (46) focuses on the process-ing of brain tumor and aims to design and evaluateDL-based automatic algorithms for glioblastoma seg-mentation and further research.

Ischemic stroke lesion datasets and challenges: Simi-lar to tumor segmentation, brain lesion segmentationalso focuses on detecting brain abnormalities. However,the difference is that lesion segmentation deals withdamaged tissues. Challenges (39, 40, 41, 42, 48) fo-cus on stroke lesion segmentation because stroke is alsolife-threatening and can disable the surviving patients.

Stroke is often associated with high socioeconomic costsand disabilities. Automatic analysis algorithms help todiagnose and treat stroke, since its manifestation is trig-gered by local thrombosis, hemodynamic factors, or em-bolic causes. In MR images, the infarct core can be iden-tified with diffusion MR images, while the penumbra(which can be treated) can be characterized by perfu-sion MR images. The challenge ISLES 2015 (42) focuseson sub-acute ischemic stroke lesion segmentation andacute stroke outcome/penumbra estimation and pro-vides 50 and 60 multi-modalities MR scans of data fortraining and validation, respectively, for two subtasks,i. e., sub-acute ischemic stroke lesion segmentation andacute stroke outcome/penumbra estimation. The subse-quent year’s challenge, ISLES 2016 (41), focuses on thesegmentation of lesions and the prediction of the degreeof disability. This challenge provides about 70 scans, in-cluding clinical parameters and MR modalities, such asDWI, ADC, and perfusion maps. The challenge ISLES2017 (40) focuses on the segmentation with acute MRimages, and ISLES 2018 (39), focuses on the segmenta-tion task based on acute CT perfusion data. Moreover,dataset (48) focuses on the segmentation of the brainafter stroke for further treatments.

Intracranial hemorrhage related datasets: Intracranialhemorrhage is another type of medical condition thataffects our health. Dataset and challenge (43, 44) fo-cus on the detection and segmentation of intracranialhemorrhage to help medics locate the hemorrhage re-gions and decide on a treatment plan. Dataset (45) alsoprovides data for the classification of normal or hemor-rhage CT images.

Multiple sclerosis lesion related datasets: Multiple scle-rosis lesion is another kind of lesion in the brain whichis not life-threatening and deadly but can cause dis-abilities. Datasets and challenges (49, 50, 51) are aboutthe multiple sclerosis lesion segmentation with multi-modalities MR data (T1w, T2w, FLAIR, etc.).

3.2.2 Classification of brain disease

Except for the tumor and lesion segmentation, braindisease classification also plays an essential role inhealthcare. Brain related diseases have a severe effecton patients’ health and their lives, e. g., Alzheimer’s dis-ease (AD) [63,64,65,66] and Parkinson’s disease (PD).Therefore, effective diagnosis and early intervention caneffectively reduce the health damage to patients, the ef-fect on the social times of families, and the economicalimpact on society. In this section, we first introduce thedatasets and challenges of AD (52, 53, 54, 55, 56, 62),

Page 11: A Systematic Collection of Medical Image Datasets for Deep

A Systematic Collection of Medical Image Datasets for Deep Learning 11Ta

ble2:

Summaryof

datasets

andchalleng

esforthebrainlesio

nan

dtumor

segm

entatio

ntask.

Ref

eren

ceD

atas

et/C

halle

nge

Yea

rM

odal

itie

sFo

cus

Les

ion/

Tum

or

Inde

xT1

T2

OT

brain

tumor

stroke

lesion

intracranial

hemorrhage

sclerosis

lesion

27CADA-A

S2020

X5

Cereb

rala

neurysm

X

28CADA-R

RE

2020

X5

Cereb

rala

neurysm

X

29CADA

2020

X5

Cereb

rala

neurysm

X

30BraTS2020

[32,33,34]

2020

X1

X2

Mutli-mod

alities

X

31BraTS2019

[32,33,34]

2019

X1

X2

Mutli-mod

alities

X

32BraTS2018

[34]

2018

X1

X2

Mutli-mod

alities

X

33BraTS2017

[35]

2017

X1

X2

Mutli-mod

alities

X

34BraTS2016

[32]

2016

X3

X4

Mutli-mod

alities

X

35BraTS2015

[32]

2015

X3

X4

Mutli-mod

alities

X

36BraTS2014

[32]

2014

X3

X4

Mutli-mod

alities

X

37BraTS2013

[32]

2013

X3

X4

Mutli-mod

alities

X

38BraTS2012

[32]

2012

X3

X4

Mutli-mod

alities

X

39ISLE

S2018

2018

X6

Ischem

icstroke

X

40ISLE

S2017

2017

XX7

Ischem

icstroke

X

41ISLE

S2016

2016

X8

Ischem

icstroke

X

42ISLE

S2015

[36]

2015

XX

X9

Ischem

icstroke

X

43CT-ICH

[37]

2020

X10

Intracranial

hemorrhage

X

44RSN

AIntracranial

Hem

orrhageDetectio

n2019

X10

Intracranial

hemorrhage

X

45HeadCT

-Hem

orrhage

2019

X10

Intracranial

hemorrhage

X

46Brain

Tumor

Progression

[38]

2018

XX

X9

Brain

tumor

X47

LGG-1p1

9qDeletion[39,40]

2017

XX

Low-grade

gliomas

X48

ATLA

S[41]

2017

XAna

tomical

segm

entatio

nX

49MSS

EG

Cha

lleng

e[42,43]

2016

XX

X11

Multip

lesclerosis

X

50MSC

halle

nge2015

[44,45]

2015

XX

X12

Long

itudina

lmultip

lesclerosis

X

51MSS

eg2008

2008

XX

X9

Multip

lesclerosis

X1

ForBraTS17

to20,T

1mod

ality

includ

esT1im

agean

dT1G

dim

age.

2Fo

rBraTS17

to20,T

2mod

ality

includ

esT2im

agean

dT2-FL

AIR

image.

3Fo

rBraTS12

to16,T

1mod

ality

includ

esT1im

agean

dT1c

image.

4Fo

rBraTS12

to16,T

2mod

ality

includ

esT2im

agean

dT2w

-FLA

IRim

age.

5MR

Ang

iograp

hy6

DW

Ian

dPe

rfusionMR

image

7T2w

andFL

AIR

8ADC

andPe

rfusionMR

image

9FL

AIR

10

CT

11

DP/T

2an

dFL

AIR

12

PDw

andFL

AIR

Page 12: A Systematic Collection of Medical Image Datasets for Deep

12 Johann Li 1 et al.

Table 3: Summary of datasets and challenges for brain disease classification tasks.

Reference Dataset/Challenge Year Modalities Diseases CategoryIndex T1 T2 DWI PT OT AD PD OT

52 ADNI-1 [47] 2004 X X X X X NC, MCI, AD53 ADNI-GO [47] 2009 X X X X X NC, MCI, AD54 ADNI-2 [48] 2011 X X X X X NC, EMCI, LMCI, AD55 ADNI-3 [49] 2016 X X X X X NC, EMCI, LMCI, AD56 OASIS 1 [50] 2007 X X NC, AD57 OASIS 2 [51] 2009 X X NC, AD58 OASIS 3 [52] 2019 X X X X6 X NC, AD59 MRIHS [53,54] 2019 X X NC, AD60 TADPOLE [55] 2017 X X X X X1 X NC, MCI, AD61 MRI and Alzheimers 2017 X X NC, AD62 CADDementia [56] 2014 X X NC, MCI, AD63 ANT [57,58] 2019 X X2 X NC, PD64 PD De Novo [59,60] 2018 X X3 X NC, PD65 SCA2 DTI [61,62] 2018 X X X4 NC, SCA2

66 MTOP 2016 X X X5 Healthy, category I or cate-gory II

1 CSF 2 Events and bold 3 Bold 4 Spinocerebellar ataxia type II 5 Mild traumatic brain injury 6 MRI FLAIR

and then we introduce other diseases (63, 64, 65). Table3 shows the relevant challenges and datasets.

Alzheimer’s disease: AD affects a person’s behavior,cognition, memory, and daily life activities. Such a pro-gressive neurodegenerative disorder affects the normaldaily life of patients because suffering from such a dis-ease makes patients not know who they are and whatthey should do which then progresses to the point untilthey forget everything they know. The disease takes anunbearable toll on the patient and leads to a high costto their loved ones and to the society. For example, ac-cording to [67], AD became the sixth deadly cause inthe U.S. in 2018 and costs more than two to three hun-dred billion U.S. dollars.

Therefore, researchers are doing everything theycould to explore the causes of AD and its treatments.Diagnosis based on medical images has become a re-search focus because early diagnosis and interventionhave significance on the progress of this disease. Hencemany researchers work on the classification, i. e., pre-diction of AD using brain images. The datasets mainlyinclude “Alzheimer’s Disease Neuroimaging Initiative(ADNI)” and “Open Access Series of Imaging Studies(OASIS)”.

The ADNI is a series of projects that aim to developclinical, imaging, genetic, and biochemical biomarkersfor the early detection and tracking of AD. It includesfour stages: ADNI-1 (52), ADNI-GO (53), ADNI-2 (54),and ADNI-3 (55). These projects provide image data ofthe brain for researchers, and the modalities of images

include MR (T1 and T2) and PET (FDG, PIB, Florbe-tapir, and AV-1451). These four stages consists of 1400subjects. The subjects can be categorized into normalcognition (NC), mild cognitive impairment (MCI), andAD, where MCI can be split into early mild cognitiveimpairment (EMCI), later mild cognitive impairment(LMCI).

The OASIS is a series of projects aiming to pro-vide neuroimaging data of the brain, which researcherscan freely access. OASIS released three datasets, whichare named OASIS-1, OASIS-2, and OASIS-3. All thesethree datasets are related to AD, but these datasets arealso used in functional areas segmentation and othertasks. The OASIS-1 (56) contains 418 subjects agedfrom 18 to 96, and for the subjects older than 60, thereare 100 subjects diagnosed with AD. The dataset in-cludes 434 MR sessions. The OASIS-2 (57) contains 150subjects, aged between 60 to 96, and each subject in-cludes three or four MR sessions (T1). About 72 sub-jects were diagnosed as normal, while 51 subjects werediagnosed with AD. Besides, there are 14 subjects whowere diagnosed as normal but were characterized as ADat a later visit. The OASIS-3 (58) includes more than1000 subjects, more than 2000 MR sessions (T1w, T2w,FLAIR, etc.), and more than 1500 PET sessions (PIB,AV45, and FDG). The dataset includes 609 normal sub-jects and 489 AD subjects.

Moreover, there are many other challenges based onADNI and OASIS or independence datasets. Challenge(60) is based on ADNI and aims at the prediction of thelongitudinal evolution. Dataset (61) is based on OASIS

Page 13: A Systematic Collection of Medical Image Datasets for Deep

A Systematic Collection of Medical Image Datasets for Deep Learning 13

and it is released on Kaggle for the classification of AD.Challenge (62) is an independent AD-related challengeto classify subjects into NC, MCI, and AD.

Other diseases: Similar to AD, other brain diseases arealso important from the diagnosis and treatment per-spective. However, the number of datasets and chal-lenges of these diseases is not as large as AD. Afew datasets focus on Parkinson’s disease (PD) andspinocerebellar ataxia type II (SCA2). Datasets (63,64) provide images of PD with MR images and classi-fication labels. Dataset (65) provides images and clas-sification labels of spinocerebellar ataxia-II, i.e., SCA2.Dataset (66) provides images and annotations for thediagnosis of mild traumatic brain injury.

3.3 Eye related datasets and challenges

As the human’s imaging sensor, the eyes’ health is es-sential for human beings, and eye diseases may leadto blindness. We introduce the relevant challenges anddatasets in this subsection and list them in Table 4.

Datasets according to the modality: With regards tothe eye-related datasets and challenges, the main usedmodalities are the fundus photo (70, 72, 73, 74, 75, 76,77, 82, 84) and OCT (71, 78, 79, 81, 83). The fundusphoto can help medics evaluate the eye’s health and lo-cate the retinal lesions because the fundus photo clearlyshows the important parts of the eye, such as the bloodvessels and the optic disc. OCT is a new imaging ap-proach that is safe for the eye and shows the retinaltissues’ in details. However, it has also disadvantages– it is not suitable for diagnosing microangioma andthe planning of retinal laser for the photocoagulationtreatment.

Datasets according to the analysis task: These datasetsand challenges can be used for four tasks.

1) Classification tasks focus on classifying whetherthe subject has specific diseases or judging whether thesubject is abnormal. Datasets and challenges (69, 70,71, 74, 75, 76, 82) focus on predicting a single disease,while others (73, 77, 78, 79) focus on diagnosing multi-ple diseases.

2) Segmentation is another task, which providesmore information compared to classification. Datasetsand challenges (70, 72, 74, 77, 81, 83) focus on the seg-mentation of the tissues and lesions for further diagnosisand disease analysis.

3) Datasets and challenges (70, 71, 74, 76, 77, 84)focus on the detection of lesions or other landmarks.

These tasks help medics locate key targets, such as ar-eas and tissues, for effective diagnosis or provide featuredetails for other automated algorithms.

4) Unlike other tasks, the last one focuses on theannotation of the tools used for eye-related surgery(80).

Datasets according to focused eye diseases: Researchersmainly focus on these diseases:

– Diabetes retinopathy (73, 75, 77, 78, 79, 82, 84)– (Age-related) macular degeneration (73, 76, 78)– Pathologic myopia (73, 74)– Diabetic macular edema (77, 83)– Glaucoma (70, 73)– Cataract (73)– Closure glaucoma (71).– Hypertension (73)

Besides these diseases, dataset (80) aims at the anno-tation of images.

3.4 Datasets and challenges of other Subjects

Besides the brain’s structural analysis, the image pro-cessing, and the computer-aided diagnosis tasks, an-other important research focus is the human neck be-cause it holds many essential glands and organs. Thissubsection discusses the datasets and challenges of theneck and teeth, covered in Subsection 3.4.1 and Sub-section 3.4.2, respectively. Moreover, many researchersare working on the analysis of behavior and cognitionwith DL-based methods. We discuss the details in Sub-section 3.4.3.

3.4.1 Neck related datasets

The neck is also essential for our health. The neck holdsmany glands and organs, and when these become ab-normal, effective diagnosis and segmentation play anessential role in their treatments. The related imagedatasets and challenges are listed in Table 5.

Datasets and challenges (85, 87, 88, 90, 94, 95) fo-cus on the segmentation of glands and the lesions andtumors in relevant glands. Dataset (89) focuses on thebinary classification tumor vs. normal. Challenge (86)aims at the task of thyroid gland nodules detection withultrasound images and videos. Challenge (91) focuseson the nerves segmentation in the neck, while challenge(96) focuses on evaluating carotid bifurcation.

Page 14: A Systematic Collection of Medical Image Datasets for Deep

14 Johann Li 1 et al.

Table4:

Summaryof

datasets

andchalleng

esof

eye-diseaserelatedtasks.

Ref

eren

ceD

atas

et/C

halle

nge

Yea

rM

odal

itie

sD

isea

ses1

Tas

ksIn

dex

OCT

FPOT

AMD

DR

GOT

Classificatio

nSe

gmentatio

nDectection

Other

67RIA

DD

[68]

2020

XX

XX2

X68

The

2ndDeepDRiD

2020

XX

X69

REFU

GE

2[69]

2020

XX

XX

X70

REFU

GE

[69]

2018

XX

XX

X

71AGE

[70]

2019

XX7

XX

72DRIV

E2019

XX6

X

73ODIR

-2019

2019

XX

XX8

X

74PA

LM2019

XX9

XX

X75

APTOS2019

2019

XX

X76

ADAM

[71]

2018

XX

XX

77ID

RiD

[72,73,74]

2018

XX

X10

XX

X

78Retinal

OCT

Images

[75,76]

2018

XX

X11

X79

ROCC

2017

XX

X

80CAT

ARACTS[77,78]

2017

X3

X4

X12

81RETOUCH

[79]

2017

XX5

X82

Diabe

ticRetinop

athy

Detectio

n[80]

2015

XX

X

83Se

gOCT

(DME)[81]

2015

XX10

X84

ROC

[82]

2009

XX

X1

AMD:a

ge-related

macular

degene

ratio

n;DR:d

iabe

ticretin

opathy

;G:g

laucom

a.2

Allthediseases

ofthis

datasetarelistedon

theoffi

cial

web

site.S

eeht

tps:

//ri

add.

gran

d-ch

alle

nge.

org/

Data

/.3

Video

4Su

rgerytoolsde

tection

5Fluidsegm

entatio

n6

Vessel

extractio

n7

Closure

glau

coma

8Diabe

tes,

cataract,h

ypertension,

andmyo

pia.

9Pa

thologic

myo

pia

10

Diabe

ticmacular

edem

a11

Macular

degene

ratio

n12

Tool

anno

tatio

n

Page 15: A Systematic Collection of Medical Image Datasets for Deep

A Systematic Collection of Medical Image Datasets for Deep Learning 15

Table 5: Summary of datasets and challenges of head and neck related diseases.

Reference Dataset/Challenge Year Modalities Focus TasksIndex CT PT OT Seg. Other

85 MICCAI 2020: HECKTOR 2020 X X Head and neck primary tumors X

86 TN-SCUI 2020 [83] 2020 X1 Thyroid gland nodules diagnosis X2

87 Head Neck Radiomics HN1 [84,85] 2019 X Head and neck squamous cell carcinoma X

88 AAPM RT-MAC 2019 [86] 2019 X3 Soft tissue and tumor X

89 Head Neck PET-CT [87,88] 2017 X X Tumor X4

90 Head & Neck AutoSeg Challenge [89] 2015 X Tumor X

91 Ultrasound Nerve Segmentation 2016 X1 Nerve X

92 Dental X-Ray Analysis 2 2015 X5 Cephalometric X6

93 Dental X-Ray Analysis 1 2015 X5 Caries X

94 Head & Neck AutoSeg 2010 [90] 2010 X7 Parotid gland X95 Head & Neck AutoSeg 2009 2009 X Multi-organs and tissues X

96 CLS 2009 [91] 2009 X8 Carotid bifurcation X1 Ultrasound 2 Detection 3 MR T2-weighted image 4 Classification 5 Computed Radiography 6 Localization7 MR 8 CT Angiography

3.4.2 Cephalometric and teeth related datasets

Challenges (92, 93) focus on the diagnosis of dental X-Ray images. The main tasks of these two challengesinclude landmark localization and caries segmentation.Challenge (92) provides around 400 cephalometric X-ray images with the annotation of landmarks by twoexperienced dentists. Challenge (93) provides about 120bitewing images with experts’ annotations of differentparts of the teeth.

3.4.3 Behavior and cognition datasets

To understand what we see, hear, smell, and feel, ourbrain draws on neurons in our brain to compute andanalyze the stimulations and understand what, where,why, and when questions and scenarios are. Many re-searchers now use Artificial Neural Networks as a re-search method to analyze the relationship betweenbrain activities and stimulation. They use functionalMR images to scan our brain activity, analyze thehemodynamic feedback, and identify the area of theneurons which react. Therefore, the analysis of the re-actions of the brain in response to a specific stimulationis an important research focus. Researchers use DL todetect or decode the stimulation of subjects to work outthe brain’s functionality. The related datasets are listedin Table 6.

Some datasets (98, 103, 107) focus on classifyingthe stimulations or the subject’s attribution based onthe subject’s functional MR images. Dataset (98) aimsto identify whether the subject is a beginner or an ex-pert in programming via the reaction of their brain tosource codes. Dataset (103) focuses on diagnosing sub-

jects with depression vs. subjects with no-depressionusing audio stimulations and analyzing the subjects’brain activity. Dataset (107) works on the influence ofcannabis on the brain.

Datasets (97, 99, 101, 102, 104, 105, 106) focus onthe encoding of the stimulations, i. e., brain activities’decoding. Datasets (101, 104, 105) aim to rebuild whatsubjects have seen using DL-based methods from theirbrain activities using functional MR images. On theother hand, datasets (99, 106) work on the encodingof faces that subjects have seen from functional MRimages with similar modalities.

4 Chest and abdomen related datasets andchallenges

There are many vital organs in the chest and abdomen.For example, the heart is responsible for the blood sup-ply; the lungs are responsible for breathing; the kidneysare responsible for the production of urine to elimi-nate toxins from the body. Therefore, the medical imageanalysis of organs in the chest and abdomen is an im-portant research focus. Most of the tasks are computer-aided diagnosis with classification, detection, and seg-mentation of lesions being the most targeted tasks.

Many datasets and challenges aim to segment oneor more organs in the chest and abdomen for diagno-sis or treatment planning. Subsection 4.1 discusses thedatasets and challenges for segmentation. Subsection4.2 introduces the datasets and challenges which focuson the diagnosis of organs in the chest and abdomen.While, Subsection 4.3 describes the datasets and chal-lenges of the chest and abdomen that are not catego-

Page 16: A Systematic Collection of Medical Image Datasets for Deep

16 Johann Li 1 et al.

Table 6: Summary of datasets and challenges that are used for behavioral and perception related tasks.

Reference Dataset/Challenge Year Modalities Tasks StimulationIndex T1 T2 Bold Events OT

97Cognitive control of sensory painencoding in the pregenual ante-rior cingulate cortex. [92]

2020 X X Sensory pain encoding pain

98 fMRI dataset on program com-prehension and expertise [93,94] 2020 X X X

Different between expertand novices in corticalrepresentations of sourcecode

brain action

99Reconstructing Faces from fMRIPatterns using Deep GenerativeNeural Networks. [95]

2019 X Face Reconstruction vision

100 Resting State - TMS [96,97] 2019 X XEffect of iTBS on fronto-striatal network and ROIsegmentation

iTBS

101 Deep Image Reconstruction [98,99] 2018 X X X X

Image reconstructionfrom human brain activ-ity

vision

102 BOLD5000 [100] 2018 X X X X1 Brain reaction to vision vision

103Neural Processing of EmotionalMusical and Nonmusical Stimuliin Depression [101,102,103]

2018 X XBrain reaction to audi-tion audio

104 Visual image reconstruc-tion [104] 2018 X X X

Image reconstructionfrom human brain activ-ity

vision

105 Generic Object Decoding [105] 2018 X X X XImage reconstructionfrom human brain activ-ity

vision

106Adjudicating between face-coding models with individual-face fMRI responses [106]

2018 XDecoding face from brainactivity vision

107T1w structural MRI study ofcannabis users at baseline and 3years follow up [107]

2018 XImpact of cannabis onbrain cannabis

1 DWI, Field map

rized above, including regression, tracking, registration,and other tasks related to the chest and abdomen or-gans.

4.1 Datasets for chest & abdomen organ segmentation

This subsection covers the datasets and challengesof the chest and abdomen organs that are used foranatomic segmentation tasks. The anatomic segmen-tation tasks include the organ contour segmentation(Subsection 4.1.1) and organ segmentation (Subsection4.1.2). The contour segmentation is different from or-gan segmentation–the former aims to separate an organfrom the backgroup or mark the boundaries betweenmultiple organs and the background. The latter aims tosegment the organ into different parts at the anatomi-cal level. Table 7 presents the datasets and challengesthat are used for the segmentation of the chest and ab-domen organs.

4.1.1 Datasets of chest and abdomen organs

Organ contour segmentation is a necessary informa-tion for the preplanning of surgery and diagnosis. Awell-segmented contour of the organs provides a pre-cise mask, which helps to produce accurate segmenta-tion results for the diagnosis, treatment, and operation.This subsection introduces datasets and challenges forthe contour segmentation of a single organ and of mul-tiple organs.

Chest & abdomen datasets according to the organ: Thedatasets and challenges that we have covered are shownhere. The following organs and parts are involved in:

– Liver (113, 114, 116, 122, 127, 128, 144)– Lung (113, 116, 118, 120, 128, 139, 143)– Kidney (110, 113, 114, 127, 128)– Prostate (116, 126, 137, 138, 142)– Heart (111, 115, 116, 120)– Pancreas (116, 123, 127, 128)

Page 17: A Systematic Collection of Medical Image Datasets for Deep

A Systematic Collection of Medical Image Datasets for Deep Learning 17

Table 7: Summary of datasets and challenges for the chest and abdomen organ segmentation tasks.

Reference Dataset/Challenge Year Modalities OrgansIndex MR CT CR OT Liver Lung Kidney Prostate Heart Other

108 MNMS Challenge [108] 2020 X X

109 Automated Segmentation ofCoronary Arteries 2020 X1 X2

110 C4KC-KiTS [109,110] 2019 X X111 MS-CMRSeg 2019 [111,112] 2019 X X

112 CAMUS [113] 2019 X3 X

113 CT-ORG [114,115,116] 2019 X X X X X4

114 CHAOS [117,118,119] 2019 X X X X5

115 SegTHOR [120] 2019 X X X6

116 Medical Segmentation De-cathlon [121] 2019 X X X X X X7

117 PAVES 2018 X X8

118 SHCXR Lung Mask [122,123,124] 2018 X X

119 AtriaSeg 2018 [125] 2018 X X

120 Lung CT Segmentation Chal-lenge 2017 [126,127] 2017 X X9 X X

121 AAPM Thoracic AutoSeg 2017 X X X122 LiTS [116] 2017 X X

123 Pancreas CT [128,129] 2016 X X10

124 Breast MRI NACT Pilot [130] 2016 X X11

125 HVSMR 2016 [120,131] 2016 X X126 Prostate Diagnosis [132] 2015 X X

127 Multi-Atlas Labeling Beyondthe Cranial Vault 2015 X X X X16

128 Anatomy3 [133] 2015 X X X X X X17

129 CT Lymph Nodes [134, 135,136,137] 2015 X X12

130 CETUS 2014 X3 X

131 VISCERAL Benchmark2 [138] 2014 X X X X X X17

132 Left Atrium SegmentationChallenge [139] 2013 X X X

133 NCI-ISBI 2013 [115] 2013 X X

134 Left Atrium Fibrosis and ScarSegmentation Challenge 2012 X X13

135 VESSEL12 2012 X X

136 CRASS12 2012 X X14

137 PROMISE12 [140] 2012 X X138 Prostate-3T [141] 2012 X X139 LOLA11 2011 X X

140 Left Ventricular SegmentationChallenge 2011 X X

141 IVUS11 [142] 2011 X X15

142 Promise09 2009 X X143 EXACT09 [143] 2009 X X144 SLIVER07 [144] 2007 X X

1 Coronary CT angiography 2 Coronary arteries 3 Ultrasound 4 Bladder 5 Spleen6 Aorta, trachea, and esophagus 7 Pancreas, colon, and other non-chest or -abdomen organs. 8 Vein9 Radiotherapy Structure Set 10 Pancreas 11 Breast12 Lymph 13 Left atrium fibrosis and scar 14 Chest structure 15 Vessel16 Adrenalglands, aorta, esophagus, gall bladder, pancreas, splenic/portal veins, spleen, stomach, vena cava17 Spleen, urinary bladder, rectus abdominis muscle, 1st lumbar vertebra, pancreas, psoas major, muscle,gall bladder, sternum, aorta, trachea, and adrenal gland

Page 18: A Systematic Collection of Medical Image Datasets for Deep

18 Johann Li 1 et al.

– Aorta (115, 127, 128)– Esophagus (115, 120, 127)– Spleen (114, 127, 128)– Adrenal glands (127, 128)– Bladder (113, 128)– Gall bladder (127, 128)– Trachea (115, 128)– Colon (116)– Breast (124)– Lymph (129)– Spinal cord (120)– Stomach (127)

Generally, these datasets and challenges focus onthe larger organs, such as the liver and the lungs, withthe aim to diagnose tumors and lesions, and wherecontour segmentation is a pre-processing step. How-ever, it is challenging to segment smaller organs withlow-resolution images, particularly for radiotherapy, be-cause an incorrect contour segmentation of these smallorgans can lead to severe consequences (e. g., organdamage). Small organs’ incorrect contour can lead totheir damage during radiotherapy.

Chest & abdomen datasets according to modality: Themost commonly used image modalities for chest and ab-domen organs segmentation are MR and CT. As Table1 shows, many datasets and challenges use MR images.MR images have higher resolution under certain con-ditions and have better resolution for soft body tissuesand organs, such as the heart and prostate. Meanwhile,CT is the most widely used modality for organ segmen-tation and other tasks and diagnosis that are relatedto chest and abdomen, such as the lung and liver, ac-cording to our research, because of its convenience, ef-fectiveness, and low cost.

Chest & abdomen datasets according to focus: The pur-pose of these datasets and challenges can be categorizedinto three groups: further analysis, benchmark, and ra-diotherapy. Most datasets and challenges which provideannotated organs’ contours are provided with the ob-jective to focus on further analysis and treatments. Oneof the challenges of segmentation is to achieve a robustsegmentation of the whole organ and separate it fromthe background, without omitting the lesions and tu-mors, and thus, some test benchmarks (116, 128) areprovided for researchers to evaluate their algorithms.Another challenge, which is addressed by datasets andchallenges (115, 120) is the imbalance between differ-ent organs because of their sizes and shapes, and suchan imbalance makes it challenging to segment small or-gans and provide valuable information for analysis andtreatment.

Single chest & abdomen organ contour segmentation:The single organ’s contour segmentation tasks usuallyfocus on segmenting a region for subsequent tasks (110,122, 123, 124, 129, 138, 144) or with an anatomical pur-pose (118, 126, 133, 137, 142, 143) for research. The dif-ficulty of the former task is that the lesions and tumorsmay affect the segmentation by separating the organfrom the background, while the latter’s difficulty is toperform more precise segmentation.

Chest & abdomen multi-organs contours segmentation:The chest and abdomen multiple organs contour seg-mentation focuses on splitting the organs from eachother. Some of these datasets and challenges (113, 114,116) focus on the segmentation of multiple organs, in-cluding the relatively larger organs, which are easier tosegment, and the relatively smaller organs, which canbe more challenging to segment compared to the largerones, especially when the model is handling the largerand smaller organs at the same time. Similarly, someof these datasets and challenges (115, 120) focus onthe “organ at risk” which means that these organs arehealthy but might be at risk because of radiation ther-apy. Dataset (127) focuses on multi-atlas-based meth-ods, which are widely used in brain-related research.Dataset (128) aims to provide a benchmark for the seg-mentation algorithms.

4.1.2 Chest & abdomen organ parts segmentation

Different from contour segmentation of the chest andabdomen organs, the organ segmentation aims to seg-ment the organ into different parts. Just as the handhas five fingers, organs are made up of multiple parts,and a typical example is the Couinaud liver segmenta-tion method. This subsection introduces the datasetsand challenges for organ segmentation. These datasetsand challenges are listed in Table 7.

Heart realted datasets and challenges: Most of thesedatasets and challenges (112, 119, 125, 130, 132, 134,140) are related to the heart segmentation. The mostfrequently used modalities are MR and ultrasound, andthe aim is to segment the heart into the left atrium,chambers, valves, and other parts. Though MR andultrasound can effectively image the different tissuesof the heart, the heartbeat results in blurred images,which makes the segmentation task more difficult, whilefor ultrasound, the dynamic nature of ultrasound im-ages is another challenge for the segmentation algo-rithm.

Page 19: A Systematic Collection of Medical Image Datasets for Deep

A Systematic Collection of Medical Image Datasets for Deep Learning 19

Others chest & abdomen body parts: Challenge (139)provides 55 CT scans and focuses on the segmentationof the lung with the labeling of its different parts: out-side the lungs, the left lung, the upper lobe of the leftlung, the lower lobe of the left lung, the upper lobe ofthe right lung, the middle lobe of the right lung, andthe lower lobe of the right lung. The biggest challengeis the effect of the lung lesions and diseases, such as tu-berculosis and pulmonary emphysema, on the perfor-mance of the segmentation. Moreover, challenges (135,141) focus on the segmentation of the lung vessels.

4.2 Datasets for diagnosis of chest & abdomen diseases

Diseases of organs in the chest and abdomen have asignificant impact on human health. Therefore, manyresearchers work on this problem by analyzing medi-cal images. Several researchers have designed automaticor semi-automatic algorithms for the classification, seg-mentation, detection, and characterization tasks to helpmedics diagnose these diseases. In this subsection, wedescribe the datasets and challenges related to the di-agnosis of diseases of the chest and abdomen that arereported in Tables 8, 9, and 10, respectively.

Chest & abdomen datasets according to modality: Ac-cording to the datasets and challenges collected, CT isthe most commonly used imaging modality for the chest& abdomen, because of its suitable imaging qualityand ability to clearly display tissues and lesions. Somedatasets and challenges also provide CT images usingcontrast agents for clearer images. Besides CT imaging,there are other modalities, including MR, X-Ray digitalradiographs, PET, endoscopy, etc. The MR images areused in breast-related diagnosis, cardiac-related tasks,soft tissue sarcoma detection, and ventilation imaging.Because of the organs’ size and the CT’s resolution,which is limited by the imaging exposure time and ra-diation dose, MR is a more suitable imaging modalityfor small or specific organs. The PET is always usedwith other modalities, such as CT and MRI. The con-trast agent’s density is related to the metabolism, whichmeans the density of radiation from contrast agent willbe high in the tumor, so PET is always used for tumorrelated tasks. Endoscopy images are used for medicalinspection of the stomach, intestines, and others.

Chest & abdomen datasets according to classification ofdiseases: The classification of diseases intends to deter-mine whether a subject is healthy or not. It is sometimescalled “detection” or “prediction”, and the prediction isdifferent from the detection task presented below.

The main focus of these datasets is to judge whetherthere is any cancer, lesion, or tumor, such as soft tissuesarcoma (192), prostate lesion (177, 184), lung cancer(161), and breast cancer (160). Classification is an ef-fective task for diagnosis, particularly computer-aidedtasks. A quick and early diagnosis can allow effectiveinterventions to increase the probability of the patientrecovery before the condition worsens.

Another focus is the classification of diseases. Thesediseases include mainly pneumothorax (164), cardiacdiseases (175), tuberculosis (178), pneumonia (179),and COVID-19, which are discussed at the end of thissubsection. The endoscopy related challenges providedata with the aim to classify RGB images and videos toclassify patient into “normal” vs. “abnormal”. Dataset(169) focuses on the classification based on the diag-nostic records. These datasets and challenges providedata for researchers to design AI-based algorithms todiagnose common diseases.

Chest & abdomen datasets for attribute classification:The characterization task of the tumor and lesion isalso called attribute classification, which focuses on thesubsequent characterization analysis of the tumors andlesions following the detection and segmentation tasksusing automatic analysis algorithms. A typical exampleis the attributes classification of pulmonary and lungcancer (159, 162, 168, 186, 189, 193). The datasets andchallenges usually provide CT scans with the annota-tion of different attributes, such as lesion type, spicu-lation, lesion localization, margin, lobulation, calcifica-tion, cavity, etc. Each attribute includes two or morecategories. Another focus is the characterization of thebreast related lesions and tumors (187, 191).

Chest & abdomen datasets for detection: In most re-search and clinical situations, classification is notenough. The medics and researchers usually focus onthe reason for such a disease, and the localization of thelesion or tumor. Further treatment evaluations, planand interpretability are the specific focus for medicsand DL researchers. Thus, detection and segmentationare the tasks which are receiving a lot of attention atpresent. The detection task aims to find a region of in-terest and localize its position. The regions of interestusually include:

– Lung cancer and tumor (173, 180, 189, 195, 197)– Pulmonary nodule (162, 174, 193, 197, 200)– Celiac-related damage (202, 203, 204, 206)– Other lung lesions (172, 183)– Polyp (198, 204)– Cervical cancer (182)– Liver cancer (188)

Page 20: A Systematic Collection of Medical Image Datasets for Deep

20 Johann Li 1 et al.

Table8:

Summaryof

datasets

andchalleng

esforchestan

dab

domen

organs-related

tasksI.

Ref

eren

ceD

atas

et/C

halle

nge

Yea

rM

odal

itie

sFo

cus

Tas

ksIn

dex

CT

MR

PT

CR

OT

Classificatio

nSe

gmentatio

nDetectio

n

145

CT

Diagn

osis

ofCOVID

-19[145]

2020

XCOVID

-19

X146

Covid19

Cha

lleng

e.eu

2020

XCOVID

-19

X147

ObjectCXR

2020

XCOVID

-19

XX

148

CORD-19[146]

2020

XCOVID

-19

X149

Covid

Che

stX-R

ayDataset

[147,148]

2020

XCOVID

-19

X

150

Detectio

nof

COVID

-19from

Ultr

asou

nd[149,150]

2020

X1

COVID

-19

X

151

COVID

-Net

[151]

2020

XCOVID

-19

X152

COVID

-19

2020

XCOVID

-19

X153

COVID

-19Rad

iograp

hyDatab

ase[152]

2020

XCOVID

-19

X154

COVID

-19Che

stX-R

ay2020

XCOVID

-19

X155

COVID

-19CT

Segm

entatio

nDataset

2020

XCOVID

-19

X

156

COVID

-19Lu

ngCTLe

sion

Segm

entatio

nCha

lleng

e2020

2020

XCOVID

-19

X

157

CT

Images

inCOVID

-19[153,154]

2020

XCOVID

-19

X158

COVID

-19AR

[155,156]

2020

XX

COVID

-19

X159

BIM

CV-C

OVID

192020

XX

COVID

-19

X

160

BCS-DBT

[157,158]

2020

X2

Breastcancer

XX

161

Lung

-PET-C

T-D

x[159]

2020

XX

Lung

cancer

X162

LNDbCha

lleng

e2020

XPulmon

aryno

dule

XX

163

A-A

FMA-D

etectio

n2020

X1

Amniotic

fluid

detection

X110

C4K

C-K

iTS[109,110]

2019

XKidne

ytumor

X164

SIIM

-ACR

Pne

umotho

raxSe

gmentatio

n2019

XPne

umotho

rax

XX

165

CT

ventila

tionim

agingevalua

tion2019

2019

X3

XVe

ntila

tionim

aging

X166

Che

Xpe

rt[160]

2019

XChe

stX

X

167

NSC

LC-R

adiomics-Interobserver1

[84,161]

2019

XX4

Non

-smallc

elllun

gcancer

X

168

StructSe

g2019

2019

XLu

ngcancer

&organs-at-

risk

X

169

MIM

IC-C

XR

[162,163]

2019

XX5

Che

stim

agean

alysis

X170

MIM

IC-C

XR-JPG

2019

XChe

stX-R

ayX

X1

Ultr

asou

nd2

Digita

lbreasttomosyn

thesis

34D

-CT

4Rad

iotherap

yStructureSe

t5

Electroniche

alth

record

andrepo

rt

Page 21: A Systematic Collection of Medical Image Datasets for Deep

A Systematic Collection of Medical Image Datasets for Deep Learning 21

Table9:

Summaryof

datasets

andchalleng

esforchestan

dab

domen

organs-related

tasksII.

Ref

eren

ceD

atas

et/C

halle

nge

Yea

rM

odal

itie

sFo

cus

Tas

ksIn

dex

CT

MR

PT

CR

OT

Classificatio

nSe

gmentatio

nDetectio

n

171

PadC

hest

2019

XChe

stX-R

ayX

172

RSN

APne

umon

iaDetectio

nCha

lleng

e2018

XPne

umon

iaX

173

ImageC

LEF2018

-Tub

erculosis

2018

XTu

berculosis

X

174

Lung

Fused-CT-Patho

logy

[164,165]

2018

XX1

Pulmon

aryno

dule

X175

ACAD

[166]

2017

XCardiac

diseases

X176

NSC

LCRad

iogeno

mics[160,167]

2017

XX

Non

-smallc

elllun

gcancer

X

177

ProstateX

2018

X2

Prostatelesion

X178

Pulmon

aryChe

stX-R

ayAbn

ormalities

2018

XTu

berculosis

X179

Che

stX-R

ayIm

ages

(Pne

umon

ia)[75,76]

2018

XPne

umon

iaX

122

LiTS[116]

2017

XLivertumor

X180

DataScienceBow

l2017

2017

XLu

ngcancer

X

181

ACRIN

-FLT

-Breast(A

CRIN

6688)[168,

169]

2017

XX

Breastcancer

X

182

CervicalC

ancerScreen

ing

2017

X3

Cervicalc

ancer

X183

NIH

Che

stX-R

ays[170]

2017

XLu

ngdisease

XX

184

SPIE

-AAPM-N

CI

PROST

ATEx

Cha

l-leng

es[171,172]

2016

X2

Prostatelesion

X

185

ImageC

LEFm

ed:T

heMed

ical

Task

2016

2016

XRep

ort-combine

dmed

ical

imageclassification

X

186

LUNA16

[173]

2016

XPulmon

aryno

dule

XX

187

The

Digita

lMam

mograph

yDREAM

Cha

lleng

e2016

XX4

Breastcancer

XX

188

Low

DoseCT

Cha

lleng

e[174,175]

2016

XLo

wdo

seCT&

Liverlesion

X189

RID

ER

Lung

CT

[176,177]

2015

XNon

–smallc

elllun

gcancer

X

190

Pha

ntom

FDA

[178,179]

2015

XPha

ntom

&pu

lmon

ary

nodu

le191

BREAST

-DIA

GNOSIS[180]

2015

XBreast

192

Soft-tissue-Sa

rcom

a[181]

2015

XX

XSo

fttis

suesarcom

aX

193

SPIE

-AAPM

Lung

CT

Cha

lleng

e[182,

183,184]

2014

XPulmon

aryno

dule

X

194

NSC

LC-R

adiomics[84]

2014

XNon

-smallc

elllun

gcancer

X1

Pathologyim

age

2T2-weigh

ted,

Protonde

nsity

weigh

ted,

Dyn

amic

contrast-enh

anced,

DE

3End

oscopy

4Mam

mograph

y

Page 22: A Systematic Collection of Medical Image Datasets for Deep

22 Johann Li 1 et al.

Table10:S

ummaryof

datasets

andchalleng

esforchestan

dab

domen

organs-related

TasksIII.

Ref

eren

ceD

atas

et/C

halle

nge

Yea

rM

odal

itie

sFo

cus

Tas

ksIn

dex

CT

MR

PT

CR

OT

Classificatio

nSe

gmentatio

nDetectio

n

133

Autom

ated

Segm

entatio

nof

Prostate

Structures

[115]

2013

XProstate

X

195

NSC

LCRad

iogeno

mics:

Initial

Stan

ford

Stud

yof

26Cases

[167,185]

2013

XX

Non

-smallc

elllun

gcancer

X

196

Ventric

ular

InfarctSe

gmentatio

n2012

XLe

ftventric

ular

myo

cardial

infarctio

nsegm

entatio

nX

197

LIDC-IDRI[186,187]

2011

XX

Lung

cancer

&pu

lmon

ary

nodu

leX

198

CT

COLO

NOGRAPHY

[188,189]

2011

XPo

lyp&

colono

grap

hyX

199

RID

ER

Collections

2011

XX

XCan

cer

X200

Autom

atic

Nod

uleDetectio

n2009

[190]

2009

XPulmon

aryno

dule

X201

VOLC

ANO

092009

XPulmon

aryno

dule

X

202

End

oCV

2021

[191,192]

2020

X1

Colon

(polyp

,cancer),

oe-

soph

agus

(Barrett’s,

dys-

plasia

and

cancer),

and

stom

ach

XX

203

EAD

2020

2020

X1

Artefactregion

XX

204

EDD

2020

2020

X1

Colon

(polyp

,cancer),

oe-

soph

agus

(Barrett’s,

dys-

plasia

and

cancer),

and

stom

ach

XX

205

SARAS-ESA

D[193]

2020

X1

Surgeonactio

nde

tection

XX

206

EAD

2019

[194]

2019

X1

Artefactregion

XX

207

AID

A-E

Subcha

lleng

e1

2016

X1

Mucosa

damage

incelia

cdisease

X

208

AID

A-E

Subcha

lleng

e2

2016

X1

Mucosain

Barrett’s

esop

h-agus

X

209

AID

A-E

Subcha

lleng

e3

2016

X1

Mucosa

ingastric

chro-

moend

oscopy

X

1End

oscopy

Page 23: A Systematic Collection of Medical Image Datasets for Deep

A Systematic Collection of Medical Image Datasets for Deep Learning 23

– Breast cancer (187)– Action and artefact of surgeon(205, 206)

Chest & abdomen datasets for segmentation: Segmen-tation is a refinement of the detection task because itprovides information about the location and the pixel-level labels. Pixel-level annotations can help researchersdesign pixel-level algorithms for accurate and effectivequantification, volume calculations, and other analy-sis and diagnosis of tumors and lesions at the pixel-level (e. g., monitoring of tumors size). According tothe datasets and challenges we have collected, most ofthem aim at the segmentation of the tumor and lesionfrom CT of:

– Lung cancer (167, 176)– Kidney tumor (110)– Pulmonary nodule (201)– Pneumothorax (164)– Liver tumor (122)– Polyp (204)

Furthermore, challenges (203, 206) focus on the seg-mentation of artifacts (e. g., polyps) in endoscopic im-ages.

COVID-19: In 2020, COVID-19 became a research fo-cus because it caused more than 100 million infectionsand two million deaths. Different datasets and chal-lenges focus on this devastating disease and providedata to help researchers develop deep learning modelsto detect COVID-19 via various medical image modal-ities.

In the view of modalities, most of these datasetsand challenges use either CT or CR images, and someprovide both modalities. One exception is dataset (150),which uses ultrasound images. These datasets provideimage annotations labeled by radiologists.

Most of these datasets and challenges are relatedto classification tasks. Datasets (145, 146, 147, 157,159) directly focus on diagnosing COVID-19 from nor-mal subjects. In contrast, datasets and challenges (149,150, 151, 153, 154) focus on diagnosing COVID-19 froma few other similar diseases, which can also lead tolung opacity or other symptoms, such as Middle EastRespiratory Syndrome (MERS), Severe Acute Respi-ratory Syndrome (SARS), and Acute Respiratory Dis-tress Syndrome. Moreover, other datasets and chal-lenges (148, 152) focus on the diagnosis task, with natu-ral language processing, genomics, or clinical methods.

Similarly, some other datasets (147, 155, 156) focuson the segmentation or detection of COVID-19 relatedlesions, such as ground-glass opacity, air-containingspace, and pleural effusion.

4.3 Datasets for other chest and abdomen-relatedtasks

Besides the classification, detection, and segmentationtasks, there are also several other tasks which are thecurrent focus of research. In the following, we presentthe datasets and challenges related to these tasks, andreport them in Table 11.

Chest & abdomen datasets for regression: Similar to at-tributes classification, regression is another task whichaims to compute or measure the target attributes fromgiven images, but the difference is that the outputs ofregression are continuous. A typical example is fetal bio-metric measurements (217, 227). These challenges pro-vide ultrasound images to help researchers design algo-rithms to measure such attributes to estimate the gesta-tional age and monitor the fetus’s growth. Besides, an-other example is cardiac measurements (212, 213, 214,218, 221, 226, 231). These datasets and challenges pro-vide MR or ultrasound images to analyze the heart’sattributes to detect heart diseases.

Chest & abdomen datasets for tracking: Tracking is acritical task because our body and organs move duringimaging. For organs, such as the heart, the character-istics of their motion is informative. Challenges (222,224) provide ultrasound data to track the liver to ana-lyze the following of a surgery and treatments. Datasetsand challenges (215, 228, 229) focus on the tracking ofthe heart. They provide ultrasound images to track andanalyze the heart.

Chest & abdomen datasets for registration: Challenge(216) focuses on the CT registration of lungs and pro-vides CT scans with and without enhanced and contrastagents. Meanwhile, challenges (220, 225) focus on theregistration between different modalities of the heartand provide MR, CT, and other modalities to registerimages with beating hearts.

Datasets for other chest & abdomen related tasks: Chal-lenges (210, 223) focus on localizing specific landmarks,including the amniotic and the heart, using ultrasoundand MR images. Challenge (211) focuses on the classi-fication of surgery videos. Dataset (232) focuses on thereconstruction of the coronary artery.

5 Datasets and challenges for pathology andblood

Though radiography, MR imaging, and other imagingmodalities have been used as the basis for diagnosis,

Page 24: A Systematic Collection of Medical Image Datasets for Deep

24 Johann Li 1 et al.Ta

ble11:S

ummaryof

datasets

andchalleng

esof

othe

rmed

ical

applications

inchestan

dab

domen.

Ref

eren

ceD

atas

et/C

halle

nge

Yea

rM

odal

itie

sFo

cus

Tas

ksIn

dex

CT

MR

US

OT

Regression

Registration

Tracking

Detectio

nOther

210

A-A

FMA-Localization

2020

XAmniotic

fluid

localization

X

211

SurgVisDom

2020

2020

X1

Surgical

task

classification

X5

212

CRT

-EPIG

GY19

2019

X2

Heart

mod

eling

X

213

LVQUAN19

2019

XFu

llqu

antifi

catio

nof

car-

diac

LVX

214

LVQUAN18

2018

XFu

llqu

antifi

catio

nof

car-

diac

LVX

215

EchoN

et-D

ynam

ic2017

XHeart

tracking

X

216

LUMIC

2018

X3

CT

registratio

nwith

phan

-tom

images

X

217

HC

18[?,195]

2018

XFe

talh

eadcircum

ference

X218

SLAW

T[196]

2016

XX

Left

atria

lwallthickne

ssX

219

Statistic

alAtla

ses

and

Com

puta-

tiona

lMod

ellin

gof

theHeart

-S2016

XX

Left

atriu

mwallthickne

ssX

220

MMMW

HS1

72017

XX

Multi-mod

alities

heartreg-

istration

X

221

2ndAnn

ualD

ataScienceBow

l2016

XCardiac

ejectio

nfractio

nX

222

CLU

ST2015

2015

XLivertracking

X223

Land

markDetectio

nCha

lleng

e2015

XLa

ndmarklocatio

nX

224

CLU

ST2014

2014

XLivertracking

X225

MotionCorrectionCha

lleng

e2014

XHeart

motioncorrectio

nX

226

Coron

aryArterySten

oses

Detectio

nan

dQua

ntificatio

nEvaluation[197]

2012

X4

Coron

ary

artery

stenoses

detection

and

quan

tifica-

tion

X

227

Cha

lleng

eUS:

ISBI2012

[198]

2012

XFe

tal

biom

etric

measure-

ments

X

228

MotionTr

acking

Cha

lleng

e2011

XHeart

motiontracking

X

229

Cardiac

Motion

Ana

lysisCha

lleng

e2011

2011

XHeart

motiontracking

X

230

RMPIR

E10

2010

XLu

ngregistratio

n231

LVMecha

nics

Cha

lleng

e2009

XMod

eling

232

Rotterdam

Coron

ary

Artery

Algo-

rithm

EvaluationFram

ework

2009

XRecon

struction

X6

1End

oscopy

2Se

eoffi

cial

descrip

tion:

http

://c

rt-e

pigg

y19.

surg

e.sh

/dat

aset

s.ht

ml.

3CT

Pulmon

aryAng

iograp

hy4

CT

Ang

iograp

hy5

Classificatio

n6

Gen

eration

Page 25: A Systematic Collection of Medical Image Datasets for Deep

A Systematic Collection of Medical Image Datasets for Deep Learning 25

pathology images are also used as a gold standard fordiagnosis, particularly for tumors and lesions. Digitalpathology images are generally obtained by collectingtissue samples, making slices, staining, and imaging.Therefore, pathology images are also one of the main-stream image modalities that are used for diagnosis.

The focus of these datasets and challenges include1) the identity and segmentation of basic elements(e. g., cell and nucleus) in pathology images, and 2)blood-based diagnosis from images. In this section, wepresent datasets and challenges of the pathology im-ages (Subsection 5.1), and at the same time, cover thedatasets and challenges of blood images in Subsection5.2.

5.1 Datasets & challenges for pathology

Pathology images are used as the basis of cancer diag-nosis. The pathologists and automatic algorithms ana-lyze images based on specific features, such as cancercells and cells under mitosis. Many organizations andresearchers provide datasets and challenges, which fo-cus on the microcosmic pathology and at the whole slideimage (WSI) level. The relevant datasets and challengesare listed in Table 12.

Imaging datasets & challenges: In most situations,WSI is used in pathology diagnosis. Unlike CT or MRimages, the pathology image is an optical image simi-lar to the picture photoed by a camera. However, onemajor difference is that a pathology image is imaged bytransillumination, while the usual photo is imaged byreflection. Another difficulty is in the size of the image.WSI is stored in a multi-resolution pyramid structure.A single multi-resolution WSI is generally achieved bycapturing many small high-resolution image patches,and it might contain up to billions of pixels. Thus, WSIis used as a virtual microscope in diagnosis for clinicalresearch, and many challenges use WSI, such as (237,245, 246, 254, 255, 256). However, in some situations,the WSI is not suitable for analysis tasks, for example,cell segmentation. Therefore, pathology image patchesare used in several other challenges, such as (233) forvisual question answering, (259) for mitosis classifica-tion, (238, 248) for multi-organs nucleus detection andsegmentation.

Datasets for stain: Slides made from human tissues arewithout color, and required to be stained. The com-monly used stains include Hematoxylin, Eosin, and Di-aminobenzidine. Usually, two or more stains are usedin staining the slide, and the most commonly used

combinations include Hematoxylin & Eosin (H&E) andHematoxylin & Diaminobenzidine (H-DAB).

Pathology datasets according to disease: The pathologyslides are widely used in the diagnosis of many diseases,especially cancer. The cancer cells and tissues have dif-ferent shapes compared to their normal counterpart.Thus, the diagnosis via pathology is the gold standard.Many datasets and challenges, such as (238, 248, 260),do not address any specific disease. At the same time,many datasets and challenges target specific diseases,such as breast cancer (239, 244), myeloma (262, 268),cancers in the digestive system (241), cervical cancer(257, 258), lung cancer (247), thyroid cancer (253), andosteosarcoma (240).

Pathology datasets according to task: Generally speak-ing, the tasks used with these datasets and challengescan be classified into two categories: microcosmic taskand WSI-level task. The latter targets the diagnosis ofdiseases, based on a classification task. Expanded fromthe simple classification tasks, many datasets and re-search methodologies focus on complex tasks, such asthe segmentation of tumor cell areas (238, 248, 249)and the detection of pathological features (241, 255).The microcosmic tasks derive from the clinical analy-sis to identify cells and detect mitosis to extract keyfeatures from pathology images to support further dis-ease diagnosis. The following subsections expand on themicrocosmic tasks and WSI-leveling tasks, respectively.

5.1.1 Microcosmic related datasets

Microcosmic tasks focus on microcosmic features ex-traction (e. g., nucleus features), for further diagnosisand WSI-level tasks. In this subsection, we introducethe microcosmic task related datasets and challenges.

Data: Unlike the WSI-level, the datasets and chal-lenges which focus on microcosmic tasks usually pro-vide small size patch-level images with high-resolution.These patches are suitable for the annotation ofmicrocosmic-level objects and resource-limited algo-rithms. The size of images varies depending on theimage analysis tasks. For the segmentation and detec-tion of cells and nucleus, the size of images is usuallya thousand-pixel square to contain the suitable num-ber of cells or nuclei. For individual cell analysis tasks(e. g., mitosis determination), the size is usually of asingle cell. For other tasks (e. g., the patch-level classi-fication), the size varies from dataset to dataset.

Page 26: A Systematic Collection of Medical Image Datasets for Deep

26 Johann Li 1 et al.Ta

ble12:S

ummaryof

datasets

andchalleng

esforpa

thology-relatedim

agean

alysis.

Ref

eren

ceD

atas

et/C

halle

nge

Yea

rFo

cus

Tas

ksSt

ain

Inde

xClassificatio

nSe

gmentatio

nDetectio

nOther

233

PathologyVQA

[199]

2020

Visua

lque

stionan

swering

X1

H&E

234

PANDA

Cha

lleng

e2020

Prostate

cancer

grad

eassess-

ment

XH&E

235

PAIP

2020

2020

Colorectalc

ancer

XX

H&E

236

MIC

CAI2020

CRPCC

2020

Com

bine

dradiologyan

dpa

thol-

ogyclassification

XH&E

237

HEROHE

2020

HER2

XH&E

238

MoN

uSAC

[200]

2020

Multi-organnu

cleu

sX

XH&E

239

Post-N

AT-B

RCA

[201,202]

2019

Cell

XX

X2

H&E

240

Osteosarcom

ada

taforViablean

dNecrotic

Tumor

Assessm

ent[203,204,205,206,207]

2019

Osteosarcom

apa

thologyim

age

XH&E

241

DigestP

ath2019

[208]

2019

Sign

etrin

gcell

XX

XH&E

242

LYON

19[209]

2019

Lymph

ocyte

XIH

C5

243

ANHIR

[210,211]

2019

Pathologyim

ageregistratio

nX3

-

244

BreastMetastasesto

Axilla

ryLy

mph

Nod

es[212,

213]

2019

Breast

cancer

metastases

tolymph

XH&E

245

Gleason

2019

2019

Gleason

grad

e/scorepred

ictio

nX

H&E

246

PAIP

2019

2019

Segm

entatio

nof

liver

cancer

&tumor

burden

XX

H&E

247

ACDC@LU

NGHP

[214]

2019

Lung

cancer

XX

H&E

248

MoN

uSeg

[215]

2018

Multi-organnu

cleu

sX

H&E

249

DataScienceBow

l2018

2018

Cell

XH&E

250

PatchCam

elyo

n[216,217]

2019

Pathologyim

agepa

tch

XH&E

251

ICIA

R2018

BACH

[218]

2018

Patchof

breast

cancer

XH&E

252

ColorectalH

istology

MNIST

2018

Colorectalp

atho

logy

patch

XH&E

253

Cha

lleng

eforTMA

inThy

roid

Can

cerDiagn

osis

2017

Thy

roid

cancer

XH&E

254

CAMELY

ON

17[217,219,220]

2017

Breastcancer

metastases

XX

H&E

255

CAMELY

ON

16[217]

2016

Breastcancer

metastases

XX

H&E

256

GlaS

2015

Colorectal

cancer

pathology

patch

XH&E

257

2ndOCCIS

Cha

lleng

e2015

Overla

ppingcervical

cell

XPa

panicolaou

258

OCCIS

2014

Overla

ppingcervical

cell

XPa

panicolaou

259

MIT

OS-AT

YPIA

-14

2014

Mito

ticX

H&E

260

CellT

rackingCha

lleng

e[221]

2020

Cell

X4

EM

6

261

Particle

Tracking

Cha

lleng

e2012

Particle

X4

EM

6

1Visua

lque

stionan

swering

2Cellc

ounting

3Non

-line

arim

ageregistratio

n4

Tracking

5Im

mun

ohistochem

istry

6Electronmicroscop

e

Page 27: A Systematic Collection of Medical Image Datasets for Deep

A Systematic Collection of Medical Image Datasets for Deep Learning 27

Pathology datasets for cell detection & segmentation:Cells are considered to be essential for the pathologyimage. The analysis of cells is one of the most effectiveways to extract pathology image features for diagnosis.The pathologists analyze the size, shape, pattern, andstained color of the cells with their knowledge and ex-pertise to make judgments about these cells and classifythem as normal or abnormal. Thus, many datasets andchallenges focus on the segmentation and detection ofcells. The cells and nucleus can be placed neatly in theslide. However, during the slide preparation, these cellscould overlap or locate randomly on the slide. Aimingat such a problem, challenges (257, 258) focus on thesegmentation and detection of overlapping cells and nu-clei. The shape and size of cells from different organsmight be different and can have different recognitionand analysis challenges. Therefore challenges (238, 248)focus on the multi-organ cells or nucleus segmentation.

Pathology datasets for patch-level classification: Gener-ally, the size of WSI is too large to be able to analyze ev-ery cell and relationships between cells. DL-based meth-ods can easily find essential information from the patch-level image to support the diagnosis based on featurelearning. Many datasets and challenges focus on thisproblem. The datasets and challenges, which providepatch-level images, mainly focus on the classification,segmentation, or detection tasks. Based on the qualityof feature learning, DL has reached the state-of-the-artperformance in many areas of computer vision. There-fore, some datasets and challenges focus on the patchitself, and not the cell itself. The tasks can vary fromthe segmentation, detection, and classification of thecell to the direct classification of the patch. Challenges(242, 250, 252, 253) focus on patch-level image classi-fication to determine whether metastatic or a differenttissue is present.

Datasets for other pathology tasks: Besides the detec-tion and segmentation of cells and the patch-level clas-sification, there are other microcosmic tasks. Challenge(259) focuses on the mitotic detection for nuclear atypiascoring. The atypical shape, size, and internal organi-zation of cells are related to the progress of cancer. Themore advanced the cancer is, the more atypical the celllooks like. Challenge (260) focuses on cell tracking, toknow how cells change shapes and move as they in-teract with their surrounding environment. This is thekey to understand cell migration’s mechanobiology andits multiple implications in normal tissue developmentand many respective diseases. Challenge (233) focuseson the visual question answering task of pathology im-ages using AI where the model is trained to pass theexamination of the pathologist.

5.1.2 Datasets for WSI-level tasks

WSI-level pathology tasks focus on the diagnosis of can-cer and pathology image processing. WSI contains allthe complete information of a patient to be able toestablish an accurate diagnosis. Automatic diagnosisalgorithms can quickly analyze the slide. This is use-ful, especially in developing countries where there isa lack of well-experienced pathologists. However, it isa challenge to directly analyze WSI for both patholo-gists and algorithms because the size of WSI can beup to 100, 000 × 100, 000 pixels. Thus, such analysisbecomes challenging, and to address this, most of thecurrent datasets and challenges focus on the classifica-tion and segmentation of biomarkers, cells, and otherregions of interest. At the end of this subsection, we in-troduce other datasets and challenges that are relatedto the tasks of regression and localization of tumors andbiomarkers.

Datasets for classification of WSI: The prime goal ofthe examination of pathological images, especially WSI,is to diagnose cancer. Thus, how to classify WSI withlarge size and limited computing resources becomes aresearch challenge. Datasets and challenges (234, 236,237, 245) focus on predicting cancer or evaluating WSI,such as Gleason grade or HER2 evaluation. At the sametime, some datasets and challenges (244, 254, 255) focuson the classification of metastasized cancer.

Datasets for segmentation and detection of WSI: DL-based methods are seen as a black box which processpathology images. The performance of these methodshas achieved the state-of-the-art performance, but theinterpretability of these methods is still difficult. Fromthe pathologists’ point of view, datasets and challenges(235, 241, 246, 254, 255) focus on the segmentation anddetection tasks to determine the critical elements whichled to a particular diagnosis, such as cancer cell areaand signet ring cell.

Datasets for other WSI tasks: Besides classificationand detection, there are a few other tasks based onWSI. This includes the registration of pathology im-ages (243) for data pre-processing and the localizationof lymphocytes (242).

5.2 Blood-related datasets

Blood image analysis is the basis of the diagnosisof many diseases. In contrast to the pathology im-ages, blood samples’ images mainly contain blood cells,

Page 28: A Systematic Collection of Medical Image Datasets for Deep

28 Johann Li 1 et al.Ta

ble13:S

ummaryof

datasets

andchalleng

esof

bloo

d-relatedim

agean

alysis

tasks.

Ref

eren

ceD

atas

et/C

halle

nge

Yea

rFo

cus

Tas

ksSt

ain

Inde

xClassificatio

nSe

gmentatio

nDetectio

nOther

262

SegP

C2021

2020

Myelomaplasmacell

XJenn

er-G

iemsa

263

Mito

EM

Cha

lleng

e[222]

2020

Mito

chon

dria

X(electronmicroscop

e)

264

Sing

le-cellM

orph

ological

Dataset

ofLe

uko-

cytes[223,224]

2019

Blast

cells

inacute

myeloid

leuk

aemia

XX

-

265

B-A

LLClassificatio

n[225,226,227]

2019

Immatureleuk

emic

blasts

XX

-266

Malaria

Bou

ndingBoxes

2019

Cells

inbloo

dX

XGiemsa

267

LYST

O2019

Lymph

ocytes

X-

268

MiM

MSB

ILab

Dataset

[228,229,230]

2019

Cell

XJenn

er-G

iemsa

269

SN-A

MDataset

[229,231,232,233]

2019

Stainno

rmalization

X1

Jenn

er-G

iemsa

270

CNMC

2019

Dataset

[226]

2019

Immatureleuk

emic

cellclassi-

fication

X-

271

Blood

CellImages

2018

Cells

inbloo

dX

Jenn

er-G

iemsa

1Stainno

rmalization.

and these datasets and challenges are aimed at blood-related cancer and cell counting. Similar to pathologyimages, these datasets and challenges also focus on thesegmentation, detection, and classification of cells. Therelevant datasets and challenges are listed in Table 13.

One of the primary tasks of these datasets is theclassification of cells, which focuses on identifying thedifferent types of cells. Dataset (271) focuses on classi-fying red blood cells, white blood cells, platelets, andother cells. At the same time, dataset (264) focuseson the classification of malignant and non-malignantcells. Other datasets and challenges (268) (multiplemyeloma segmentation), (263) (mitochondria segmen-tation), (266) (malaria detection) focus on the segmen-tation and detection of blood cells and biomarkers.

6 Other datasets

Although we have categorized the datasets and chal-lenges into three parts: “head and neck”, “chest andabdomen”, and “pathology and blood”, several otherdatasets cannot be categorized under these three ar-eas. In this section, we introduce the datasets andchallenges categorized under “other” which means thatthese datasets do not fit under the above categories butthey are still relevent to DL methods. The topics of thissection include bone (Subsection 6.1), skin (Subsection6.2), phantom (Subsection 6.3), and animal (Subsection6.4).

6.1 Bone-related datasets

Medical image analysis of bone is currently a majorresearch focus. Radioautography is the most effectiveway to image bones, because X-Ray is sensitive tocalcium that makes up human bones. The segmenta-tion of bone, the detection of abnormalities, and theircharacterization are meaningful clinical and researchtasks. Therefore, the following subsections discuss thedatasets and challenges for the classification, segmen-tation, and other tasks, and Table 14 reports thesedatasets and challenges.

Bone datasets for classification: The classificationtasks for bone related computer-aided diagnosis is thefocus for many researchers. Though the classificationcannot locate the regions of interest, it can still helporthopedists to judge whether the patient is healthy ornot, such as in dataset (283). The diagnosis of tears andabnormality is also a research focus, such as meniscaltears (279), vertebral fracture (282), and knee abnor-mality (279).

Page 29: A Systematic Collection of Medical Image Datasets for Deep

A Systematic Collection of Medical Image Datasets for Deep Learning 29

Table14:S

ummaryof

datasets

andchalleng

esof

bone

-related

imagean

alysis

tasks.

Ref

eren

ceD

atas

et/C

halle

nge

Yea

rM

odal

itie

sFo

cus

Tas

ksIn

dex

MR

CT

CR

Classificatio

nSe

gmentatio

nOther

272

KNOAP2020

2020

XX

Kne

eosteoarthritis

X

273

MIC

CAI2020

RibFrac

Cha

lleng

e[234]

2020

XRib

fracture

detectionan

dclassifica-

tion

XX1

274

Spinal

CordMRIPub

licDatab

ase

2020

XBon

eim

aging

X2

275

VerSe20

2020

XVe

rteb

raX

276

VerSe19

[235,236]

2019

XVe

rteb

raX

277

Pelvic

Referen

ceData[237]

2019

XPe

lvic

images

X3

278

AASC

E19

2019

XSp

inal

curvature

X4

279

MRNet

Dataset

[238]

2018

XKne

eX

280

MURA

[239]

2018

XAbn

ormality

inmusculoskeletal

X281

xVertSeg

Cha

lleng

e2016

XFracturedverteb

rae

XX

282

CSI

2016

2016

XIntervertebral

disc

and

verteb

ral

fracture

XX

X1

283

Bon

eTe

xtureCha

racterization[240]

2014

XBon

ecrisps

X

284

Spinean

dVe

rteb

raSe

gmentatio

nCha

lleng

e[235,236,

241]

2014

XSp

ine

X

285

SKI10

2010

XCartilagean

dbo

nein

knee

X1

Detectio

n2

Gen

eration

3Registration

4Regression

Bone datasets for segmentation: The segmentationtask of bone images plays a vital role in clinical diag-nosis and treatment. The computer-aided segmentationalgorithms and orthopedist need to segment the differ-ent parts of the bone from a given image and make asound judgment to provide a more adequate treatment.The difficulty with such tasks is the low-resolution ofimages compared with other image modalities. The fo-cus of these datasets and challenges include the spine(282, 284), vertebrae (275, 276, 281), and knee cartilage(285).

Other bone related tasks: Besides the classification andsegmentation tasks, the datasets and challenges of bonealso include imaging (274), registration (277), spinalcurvature estimation (278), labeling (276), and abnor-mality detection (280).

6.2 Skin-related datasets

Skin cancer is one of the most common type of cancer,and melanoma is one of the most lethal types of skincancer. To diagnose skin cancer, dermoscopy is usedto image the skin, and the classification, segmentation,and detection tasks are employed. The most relevantdatasets and challenges are reported in Table 15.

Aiming at the computer-aided diagnosis of melanoma,ISIC released datasets and a series of challenges for clin-ical training and for the development of automatic algo-rithms. The challenges of ISIC include: 2017 (290), 2018(289), 2019 (288). Challenges (289, 290) include threesub-challenges: lesion segmentation, lesion attributiondetection, and lesion classification with thousands ofdermoscopic images. Besides, challenge (288) focuseson the classification of melanoma, melanocytic nevus,basal cell carcinoma, actinic keratosis, benign kerato-sis, dermatofibroma, vascular lesion, squamous cell car-cinoma, and others. Challenge (287), i. e., ISIC 2020,focuses on the classification of melanoma to better sup-port dermatological clinical works with 33126 scans ofmore than 2000 patients.

Moreover, challenge (286) focuses on Diabetic FootUlcers, i. e., DFU. The challenges provide more than2000 images of feet photographed with regular camerasunder a consistent light source and annotated by ex-perts for training and testing of automatic detectionand classification algorithms.

6.3 Phantom-related datasets

Phantom is an object based on a specific material tomainly evaluate medical imaging equipment. Phantom

Page 30: A Systematic Collection of Medical Image Datasets for Deep

30 Johann Li 1 et al.

Table 15: Summary of datasets and challenges for skin, phantom, and animal related image analysis tasks.

Reference Dataset/Challenge Year Modalities FocusIndex MR CT RGB OT

286 DFU 2020 [242] 2020 X Diabetic foot ulcer detection

287 SIIM-ISIC Melanoma Classifica-tion [243] 2020 X Melanoma classification

288 ISIC 2019 [244,245] 2019 X Classification of 9 diseases

289 ISIC 2018 [244,245] 2018 XLesion segmentation and attribution clas-sification of Melanoma and disease classi-fication

290 ISIC 2017 [246] 2017 XLesion segmentation and attribution clas-sification of Melanoma and disease classi-fication

291 MATCH 2020 X Tumor tracking in markerless lung

292 MRI-DIR [247,248] 2018 X XMulti-modality registration with phan-tom

293 CC-Radiomics-Phantom-3 [249,250] 2018 X Phantom in different machine

294 CC-Radiomics-Phantom-2 [251] 2018 XFeature variability assessment with phan-tom

295 Credence Cartridge Radiomics Phan-tom CT Scans [252] 2017 X Phantom research

296 PET-Seg Challenge 2016 X X1 Phantom registration and research297 ISMRM 2015 2015 X2 Fiber imaging from phantom298 RIDER Phantom PET-CT [253] 2015 X X1 Phantom research & registration299 Mouse awake rest [254] 2020 X Mouse brain segmentation300 EndoVis 2019 SCARED 2019 X3 Depth estimation from endoscopic data301 MouseLemurAtlas MRIraw [255,256] 2019 X Mouse lemur brain image segmentation302 CPTAC-GBM [257] 2018 X X Glioblastoma multiforme303 BigNeuron 2016 X4 Animal neuron reconstruction304 Apples-CT 2020 X Apple reconstruction and segmentation

305 QUBIQ 2020 X XQuantification of uncertainties in biomed-ical image quantification

306 AnDi [258] 2020 X2 Anomalous diffusion

307 The Open Knowledge-Based PlanningChallenge 2020 X Dose distributions predictions

308 Learn2Reg [259] 2020 X X Image registration309 SNEMI3D 2013 X5 Segmentation of neurites310 SNEMI2D 2012 X5 Segmentation of neuronal structures

1 PET 2 DWI 3 Endoscopy 4 General microscopy 5 Electron microscope

can be used in the registration of different pieces ofequipment and using it as data for the developmentof automatic algorithms. Registration is essential forclinical diagnosis. For instance, it reduces the differ-ence between different medical devices with or with-out the same modalities (293, 294, 295, 298). Whendata is scarce, then the phantom becomes useful. Someimage analysis tasks or experiments require surgicallyinserted fiducial markers, which are costly and risky.However, the phantom has low cost and risk, easy toimage, and easy to be annotated (291, 296, 297). Therelated datasets and challenges are reported in Table15.

6.4 Animal-related datasets

Medical image analysis of animal material is relativelya smaller research area. However, it is not as limited byprivacy and stricter ethics restrictions as human medi-cal images. The datasets and challenges we found focuson animal brain segmentation (299, 301), depth estima-tion from endoscopic (300), and multi-modality regis-tration (292). The relevant datasets and challenges arereported in Table 15.

7 Discussions

The success of AI algorithms such as DL has led totheir widespread use in several fields, including for med-

Page 31: A Systematic Collection of Medical Image Datasets for Deep

A Systematic Collection of Medical Image Datasets for Deep Learning 31

ical image analysis. Researchers with different knowl-edge and background tackle image-based clinical tasksusing computer vision tools to design automatic algo-rithms for different applications [11,12,12,260,261,262,263, 264]. Though AI algorithms can successfully han-dle many tasks, several unsolved problems and chal-lenges hinder the development of AI-based medical im-age analysis.

7.1 Problems and challenges

DL-based algorithms learn from input images of realdata through gradient descent. Large-scale annotateddatasets and a powerful DL model are key to the de-velopment of successful DL models. For example, thesuccess of AlexNet [14], GoogleNet [2], ResNet [3] arebased on powerful models, which include millions ofparameters. At the same time, a large-scale dataset,such as ImageNet [265], is also necessary to train theDL model to be able to tune such a large number ofparameters. However, when these methods are appliedto medical image analysis, many domain-specific prob-lems and challenges start to appear. This subsectiondiscusses some of these challenges.

7.1.1 Data scarcity

The biggest challenge in the development of DL modelsis data scarcity. Different from other areas, the scale ofthe medical image datasets is usually smaller due tomany limitations, e. g., the ethical restrictions.

The commonly used datasets for traditional com-puter vision are in larger scale compared to medi-cal image datasets. For example, the handwritten dig-its dataset, MNIST [266] includes a training set with60,000 examples and a testing set with 10,000 examples;the ImageNet dataset [265] includes three million im-ages for training and testing; Microsoft COCO [267] in-cludes more than two million images with annotations.In contrast, many medical image datasets are smallerand only include hundreds or at most thousands of im-ages. For example, the challenge BraTS 2020 (30) in-cludes four hundred subjects and different modalitiesfor each subject; the challenge REFUGE (70) providesabout 1200 images of the eye; the challenge LUNA 16(186) provides 888 CT scans; our recently publisheddataset of pulmonary lesions [268] just provides 694scans; the challenge CAMELYON 17 (254) containsonly more than 1000 WSI pathology images.

There are multiple reasons for the lack of data. Themain cause is due to the restricted access to medical im-ages by non-medical researchers, i. e., barriers betweendisciplines. The root causes of these barriers relate to

the cost and difficulties of annotation and the restrictedaccess due to ethics and privacy.

Access to data: As mentioned in the introduction, thedirect cause of the data scarcity is that most non-medical researchers are not allowed to access medicaldata directly. Though many medical data are gener-ated worldwide every day, most non-medical researchershave no authorization to access clinical data. The eas-ily accessible data are publicly available datasets, butthese datasets are not at a large-scale to be able toproperly train a DL model.

Ethical reasons: Ethics of medical data usage is a majorbottleneck and a limitation to researchers, particularly,computer scientists. Medical data stored in databasesalways contains sensitive or private information, such asname, age, gender, and ID number. In some cases, thedata records of medical images can be used to identifya patient. For example, if an MR scan includes the face,an intruder can identify them for a possibly evil pur-pose. In most countries and regions, it is illegal to dis-tribute such data with private information without thepatients’ permission, and nobody would usually con-sent to such distribution. Therefore, for Deep learningresearchers, it is impossible to gain authorization to ac-cess these datasets.

Before DL researchers are able to gain authorizationeven to desensitized data, they still need to pass ethicalreviews.

Annotation: Another root cause is the difficulty to an-notate medical images. Unlike other computer visionareas, the annotation of medical images requires spe-cialized professions and knowledge. For example, in au-topilot, when annotating objects such as vehicles andpedestrians, there are no specific annotators’ require-ments because most of us can easily distinguish a caror a human. However, when annotating medical images,domain-specific knowledge is essential. E. g., few peopleif naive would be able to tell the differences between anabnormal and normal tissue. However, it is impossiblefor a non-specialist to mark the lesion’s contour or di-agnose a disease.

This difficulty cannot easily be solved even whenprofessionals are employed to annotate data. First, thecost of annotation of medical data is huge. Once theresearcher and their organization have obtained somedata, they need then to spend more money to employfew medics for its labeling. Such annotation cost is enor-mous, particularly where medical resources are scarceor where medical costs are high. For example, the chal-lenge PALM (74) provides about 1,200 images with an-notation, but its organizers involved only two clinical

Page 32: A Systematic Collection of Medical Image Datasets for Deep

32 Johann Li 1 et al.

medics. Second, the physician who annotates the datais required to have a rich clinical and diagnosis experi-ence, thus reducing the number of people who are suit-able for this task even further. Third, to avoid anysubjectivity, one image needs to be annotated by twoor more physicians. Another problem is what to do ifthe labels of two annotators are not the same? In manychallenges, the organizer employs many junior physi-cians to annotate and employs a senior physician todecide if the junior physicians’ annotations are not thesame. For example, in the challenge AGE (71), eachdata annotation is determined by the mean of four in-dependent ophthalmologists in a group and it is thenmanually verified by a senior glaucoma expert.

7.1.2 Limitation of medical data

The characteristics of medical images themselves posedifficulties for the medical image analysis tasks.

There are many types and modalities of images thatare used in medical image analysis. Similar to computervision, the modalities include both 2D and 3D. How-ever, the medical images have several other differences.Though the average scale of a medical image dataset issmaller than computer vision-related field datasets, thesize of each sample of data is larger on average than theone of a computer vision-related field.

For 2D images, CR, WSI, and other modalities havelarge variances in the resolution and color than theother computer vision fields. Some modalities mightneed more bits to encode a pixel, while some modal-ities are significantly huge. For example, CAMELYON17 (254) only includes about a thousand of pathologyimages, but the whole dataset is about three terabytes.Such datasets with few large samples pose a challengefor the AI algorithms, and it has become a focus ofresearch to design an algorithm that can learn fromlimited computational resources (e. g., the number oflabeled samples) and be useful for clinical diagnosis.

For 3D medical images such as CT and MRI, theyare dense 3D data, compared with sparse data, such aspoint cloud, in autopilot. Like the BraTS serial chal-lenges (30, 31, 32, 33, 34, 35, 36, 37, 38), many re-searchers face the challenges to design algorithms thatcan effectively learn from multi-modal dataset.

These characteristics of medical images require well-designed algorithms with a more robust capability tofit the data well and without overfitting. However, thatfurther leads to the need for more data and resources.It is a challenge to learn suitable features from a smallsample dataset.

7.2 No silver bullet

The ideal scenario is to find or invent a method or analgorithm to simultaneously solve all of these encoun-tered problems. However, there is no silver bullet.The problems and challenges related to the data andthe adopted methods cannot be entirely resolved, orsometimes, a problem arises as another is solved. Nev-ertheless, many ideas have been introduced to addressthe current problems, and they are introduced in thissubsection.

With respect to the problems and challenges men-tioned above, researchers are working on two researchdirections: 1) a more effective model with less data, and2) a more practical approach to access data. For thelearning methods with small datasets, researchers useapproaches such as few-shot learning and transfer learn-ing. In order to access more data, researchers adoptthree main approaches, namely federated learning, life-long learning, and active learning.

7.2.1 Practical learning from small samples

Many medical image datasets have a small number ofsamples. For example, challenge MRBrains13 (6) onlyincludes 20 subjects for training and testing, while chal-lenge KITS 19 (110) has about two hundred subjects.Therefore, many researchers struggle to find a practicalapproach to learn from small samples.

Few-shot learning and zero-shot learning Few-shotlearning hits one of the critical spots of DL-based med-ical image analysis problems, i. e., the development ofDL models with fewer data. Humans can effectivelylearn from few samples. Therefore, different from thestandard deep learning-based methods, humans learnto diagnose a disease from images, without the needto view tens of thousands of images (i. e., from onlyfew-shot). Meta-learning, which is also called learningto learn, is a solution used to solve few-shot learn-ing problem. Meta-learning can learn the meta-featuresfrom a small data size. The number of medical imagesin most datasets and challenges is not as large com-pared to the regular computer vision-related datasetsand challenges. Mondal et al. [269] use few-shot learn-ing and GAN to segment medical images. The GANis modified for semi-supervised learning with few-shotlearning. Similar to few-shot learning, zero-shot learn-ing aims at novel samples. Rezaei et al. [270] cover a re-view of zero-shot learning from autonomous vehicles toCOVID-19 diagnosis. However, zero-shot and few-shotlearning have also their disadvantages, such as domaingap, overfitting, and interpretability.

Page 33: A Systematic Collection of Medical Image Datasets for Deep

A Systematic Collection of Medical Image Datasets for Deep Learning 33

Knowledge transfer: Transfer learning is another method,which can recognize and apply knowledge and skillslearned from a previous task. For example, both whitematter and gray matter segmentation and multi-organssegmentation are segmentation tasks. However, theneural network training is usually independent, whichmeans that almost nobody trains a neural network withtwo tasks at once. However, it does not mean that thesetwo tasks are unrelated. Besides zero-shot learning andfew-shot learning, transfer learning, or say, knowledgetransfer, is another method to infer knowledge from apreviously learned task. Transfer learning can be ap-plied to two similar tasks and between different do-mains. The most significant advantage of transfer learn-ing is that they use rich scale datasets to pre-train theneural network and then fine tune and transfer the net-work to the main task on a few samples dataset.

7.2.2 Effective access to more samples

Besides finding a practical approach to learn from smallsamples, many researchers have been working on activelearning and federated learning (which aims to use datawithout access to sensitive information). This also re-duces annotation costs of deep learning algorithms.

Federated learning: Federated learning provides an-other way to access data. As discussed previously, thelimitation of accessing data is led by privacy and otherproblems. Instead of directly sharing data, federatedlearning shares the model to protect privacy from be-ing leaked. With other privacy protection methods, fed-erated learning can effectively use the data from eachindependent data center or medical center.

However, there are two disadvantages of federatedlearning: annotation and implementation. The problemof annotation cannot be solved by sharing data butother methods. The main challenges are the implemen-tation, as only a few institutions have attempted feder-ated learning so far. For example, Intel and other insti-tutions have attempted to apply federated learning forbrain tumor-related tasks in their research [271]. Themain challenges in their implementation include:

1) The implementation and proof of privacy protec-tion,

2) The methodology for sharing and updating millionsof the model’s parameters,

3) Preventing attacks on DL algorithms and leaks ofdata privacy on the Internet or computing nodes.

Natural language processing: Natural language process-ing is also a potential tool to automatically or semi-automatically annotate medical image data. It is a stan-

dard procedure for a medic to provide a diagnostic re-port of the patient, particularly after the medical im-age was taken. Therefore, such large amounts of data(image and text) is useful for medical image analy-sis after desensitization, and natural language process-ing can be used for annotation. Several natural lan-guage processing-based methods, e. g., [272, 273, 274]have been applied in medical-related research fields.

Active learning: Active learning aims to reduce the an-notation cost by indirectly using the unlabeled data toselect the “best” samples to annotate. Generally, dataannotation for deep learning requires experts to labeldata so that the neural network can learn from the data.Active learning does not require too many samples atthe beginning of training. In other words, active learn-ing can “help” annotators to label their data. Activelearning uses the knowledge learned from the labeleddata to select and annotate the unlabeled data. Theunlabeled data with annotation from algorithms is usedto subsequently train the network over the next num-ber of epochs. Active learning [275, 276] is used in themedical image analysis in a loop of 1) algorithm learnfrom the data annotated by humans, 2) human anno-tate the unlabeled data selected by the algorithm 3)algorithm add the newly labeled data to the trainingset. The advantage of active learning is obvious: anno-tators do not need to annotate all the data they have,and at the same time, the neural network can learnfrom data faster from such interactive progress.

8 Conclusion

In this work, we have provided a comprehensive surveyof the datasets and challenges for medical image analy-sis, collected between 2013 and 2020. The datasets andchallenges were categorized into four themes: head andneck, chest and abdomen, pathology and blood, andothers. We provide a summary of all the details aboutthese themes and data. We also discuss the problemsand challenges of medical image analysis and the pos-sible solutions to these problems and challenges.

Acknowledgments

We thanks for the projects of National Natural ScienceFoundation of China (62072358), Zhejiang Universityspecial scientific research fund for COVID-19 prevern-tion and control, National Key R&D Program of Chinaunder Grant No 2019YFB1311600.

Page 34: A Systematic Collection of Medical Image Datasets for Deep

34 Johann Li 1 et al.

References

[1] K. Simonyan and A. Zisserman. Very deep convolutionalnetworks for large-scale image recognition. In 3rd Inter-national Conference on Learning Representations, ICLR2015 - Conference Track Proceedings, 2015.

[2] C. Szegedy et al. Going deeper with convolutions. InProceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition, volume 07-12-June, pp. 1–9, 2015.

[3] K. He et al. Deep residual learning for image recogni-tion. In Proceedings of the IEEE Computer Society Con-ference on Computer Vision and Pattern Recognition, vol-ume 2016-Decem, pp. 770–778, 2016.

[4] D. Silver et al. Mastering the game of Go with deep neuralnetworks and tree search. Nature, 529(7587):484–489, jan2016.

[5] O. Vinyals et al. Grandmaster level in StarCraftII using multi-agent reinforcement learning. Nature,575(7782):350–354, nov 2019.

[6] O. Ronneberger et al. U-net: Convolutional networksfor biomedical image segmentation. In Lecture Notes inComputer Science (including subseries Lecture Notes inArtificial Intelligence and Lecture Notes in Bioinformat-ics), volume 9351, pp. 234–241. 2015.

[7] H. Peng et al. Predicting Isocitrate Dehydrogenase (IDH)Mutation Status in Gliomas Using Multiparameter MRIRadiomics Features. Journal of Magnetic ResonanceImaging, 53(5):1399–1407, may 2021.

[8] M. Mehdizadeh et al. Deep feature loss to denoise OCTimages using deep neural networks. Journal of BiomedicalOptics, 26(04), apr 2021.

[9] B. Wang et al. AI-assisted CT imaging analysis forCOVID-19 screening: Building and deploying a medicalAI system. Applied Soft Computing, 98:106897, jan 2021.

[10] Z. Shi et al. A clinically applicable deep-learning modelfor detecting intracranial aneurysm in computed tomog-raphy angiography images. Nature Communications,11(1):6090, dec 2020.

[11] C. P. Mao et al. Altered resting-state functional connec-tivity and effective connectivity of the habenula in irrita-ble bowel syndrome: A cross-sectional and machine learn-ing study. Human Brain Mapping, 41(13):3655–3666, sep2020.

[12] N. Liu et al. Learning the Dynamic Treatment Regimesfrom Medical Registry Data through Deep Q-network.Scientific Reports, 9(1):1495, dec 2019.

[13] S. Khan et al. A Guide to Convolutional Neural Networksfor Computer Vision. Synthesis Lectures on ComputerVision, 8(1):1–207, feb 2018.

[14] A. Krizhevsky et al. ImageNet classification with deepconvolutional neural networks. In Communications of theACM, volume 60, pp. 84–90, 2017.

[15] K. Clark et al. The cancer imaging archive (TCIA):Maintaining and operating a public information reposi-tory. Journal of Digital Imaging, 26(6):1045–1057, dec2013.

[16] A. L. Goldberger et al. PhysioBank, PhysioToolkit, andPhysioNet: components of a new research resource forcomplex physiologic signals. Circulation, 101(23), jun2000.

[17] D. Bontempi et al. CEREBRUM: a fast and fully-volumetric Convolutional Encoder-decodeR for weakly-supervised sEgmentation of BRain strUctures from out-of-the-scanner MRI. Medical Image Analysis, 62, sep2020.

[18] Y. Sun et al. Multi-Site Infant Brain Segmentation Algo-rithms: The iSeg-2019 Challenge. IEEE Transactions onMedical Imaging, 2021.

[19] L. Wang et al. Benchmark on automatic six-month-old infant brain segmentation algorithms: The iSeg-2017 challenge. IEEE Transactions on Medical Imaging,38(9):2219–2230, sep 2019.

[20] NEATBrainS - Image Sciences Institute.[21] A. M. Mendrik et al. MRBrainS Challenge: Online

Evaluation Framework for Brain Image Segmentation in3T MRI Scans. Computational Intelligence and Neuro-science, 2015:1–16, 2015.

[22] I. Išgum et al. Evaluation of automatic neonatal brainsegmentation algorithms: The NeoBrainS12 challenge.Medical Image Analysis, 20(1):135–151, feb 2015.

[23] H. J. Kuijf et al. Standardized Assessment of AutomaticSegmentation of White Matter Hyperintensities and Re-sults of the WMH Segmentation Challenge. IEEE Trans-actions on Medical Imaging, 38(11):2556–2568, nov 2019.

[24] A. Klein and J. Tourville. 101 Labeled Brain Images and aConsistent Human Cortical Labeling Protocol. Frontiersin Neuroscience, 6(DEC), 2012.

[25] B. van Ginneken et al. 3D segmentation in the clinic:A grand challeng. International Conference on MedicalImage Computing and Computer Assisted Intervention,10:7–15, 2007.

[26] J. Li et al. A Baseline Approach for AutoImplant: TheMICCAI 2020 Cranial Implant Design Challenge. Lec-ture Notes in Computer Science (including subseries Lec-ture Notes in Artificial Intelligence and Lecture Notes inBioinformatics), 12445 LNCS:75–84, jun 2020.

[27] AccelMR 2020 Prediction Challenge – AccelMR 2020 forISBI 2020.

[28] MRI White Matter Reconstruction | ISBI 2019/2020 ME-MENTO Challenge.

[29] R. Souza et al. An open, multi-vendor, multi-field-strength brain MR dataset and analysis of publicly avail-able skull stripping methods agreement. NeuroImage,170:482–494, apr 2018.

[30] J. Zbontar et al. fastMRI: An open dataset and bench-marks for accelerated MRI. arXiv, nov 2018.

[31] R. S. Desikan et al. An automated labeling system forsubdividing the human cerebral cortex on MRI scans intogyral based regions of interest. NeuroImage, 31(3):968–980, jul 2006.

[32] B. H. Menze et al. The Multimodal Brain Tumor ImageSegmentation Benchmark (BRATS). IEEE Transactionson Medical Imaging, 34(10):1993–2024, oct 2015.

[33] S. Bakas et al. Advancing The Cancer Genome Atlasglioma MRI collections with expert segmentation labelsand radiomic features. Scientific Data, 4(1):170117, dec2017.

[34] S. Bakas et al. Identifying the best machine learningalgorithms for brain tumor segmentation, progression as-sessment, and overall survival prediction in the BRATSchallenge. arXiv, nov 2018.

[35] S. Bakas et al. MICCAI BraTS 2017: Scope | Section forBiomedical Image Analysis (SBIA) | Perelman School ofMedicine at the University of Pennsylvania, 2017.

[36] O. Maier et al. ISLES 2015 - A public evaluation bench-mark for ischemic stroke lesion segmentation from mul-tispectral MRI. Medical Image Analysis, 35:250–269, jan2017.

[37] M. D. Hssayeni et al. Intracranial hemorrhage segmenta-tion using a deep convolutional model. Data, 5(1):14, feb2020.

Page 35: A Systematic Collection of Medical Image Datasets for Deep

A Systematic Collection of Medical Image Datasets for Deep Learning 35

[38] K. Schmainda and M. Prah. Brain-Tumor-Progression,2018.

[39] Z. Akkus et al. Predicting Deletion of ChromosomalArms 1p/19q in Low-Grade Gliomas from MR ImagesUsing Machine Intelligence. Journal of Digital Imaging,30(4):469–476, aug 2017.

[40] B. Erickson et al. Data From LGG-1p19qDeletion, 2017.[41] S. L. Liew et al. A large, open source dataset of stroke

anatomical brain images and manual lesion segmenta-tions. bioRxiv, 2017.

[42] O. Commowick et al. Objective Evaluation of Multi-ple Sclerosis Lesion Segmentation using a Data Manage-ment and Processing Infrastructure. Scientific Reports,8(1):13650, dec 2018.

[43] O. Commowick et al. MICCAI 2016 MS lesion segmen-tation challenge: supplementary results, 2018.

[44] A. Carass et al. Longitudinal multiple sclerosis lesion seg-mentation: Resource and challenge. NeuroImage, 148:77–102, mar 2017.

[45] A. Carass et al. Longitudinal multiple sclerosis lesionsegmentation data resource. Data in Brief, 12:346–350,jun 2017.

[46] D. Scheie et al. Fluorescence in situ hybridization (FISH)on touch preparations: A reliable method for detect-ing loss of heterozygosity at 1p and 19q in oligoden-droglial tumors. American Journal of Surgical Pathology,30(7):828–837, jul 2006.

[47] S. G. Mueller et al. The Alzheimer’s disease neuroimag-ing initiative. Neuroimaging Clinics of North America,15(4):869–877, 2005.

[48] P. S. Aisen et al. Alzheimer’s Disease Neuroimaging Ini-tiative 2 Clinical Core: Progress and plans. Alzheimer’sand Dementia, 11(7):734–739, jul 2015.

[49] M. W. Weiner et al. The Alzheimer’s Disease Neuroimag-ing Initiative 3: Continued innovation for clinical trialimprovement. Alzheimer’s and Dementia, 13(5):561–571,may 2017.

[50] D. S. Marcus et al. Open Access Series of Imaging Stud-ies (OASIS): Cross-sectional MRI data in young, middleaged, nondemented, and demented older adults. Journalof Cognitive Neuroscience, 19(9):1498–1507, sep 2007.

[51] D. S. Marcus et al. Open access series of imaging stud-ies: Longitudinal MRI data in nondemented and de-mented older adults. Journal of Cognitive Neuroscience,22(12):2677–2684, dec 2010.

[52] P. LaMontagne et al. OASIS-3: Longitudinal Neu-roimaging, Clinical, and Cognitive Dataset for Nor-mal Aging and Alzheimer Disease. medRxiv, pp.2019.12.13.19014902, jan 2019.

[53] H. Varmazyar et al. MRI Hippocampus Segmentationusing Deep Learning autoencoders. 2020.

[54] S. Malekzadeh. MRI Hippocampus Segmentation | Kag-gle, 2019.

[55] R. V. Marinescu et al. The Alzheimer’s disease predic-tion of longitudinal evolution (tadpole) challenge: Resultsafter 1 year follow-up. arXiv, feb 2020.

[56] E. E. Bron et al. Standardized evaluation of algorithmsfor computer-aided diagnosis of dementia based on struc-tural MRI: The CADDementia challenge. NeuroImage,111:562–579, may 2015.

[57] P. Boord et al. Executive attention networks show alteredrelationship with default mode network in PD. NeuroIm-age: Clinical, 13:1–8, 2017.

[58] T. M. Madhyastha et al. Dynamic connectivity at restpredicts attention task performance. Brain Connectivity,5(1):45–59, feb 2015.

[59] C. Tessa. PD De Novo: Resting State fMRI and Physio-logical Signals, 2018.

[60] C. Tessa et al. Central modulation of parasympatheticoutflow is impaired in de novo Parkinson’s disease pa-tients. PLoS ONE, 14(1):e0210324, jan 2019.

[61] H. Mori. Diffusion tensor imaging, 2008.[62] M. Mascalchi et al. Histogram analysis of dti-derived

indices reveals pontocerebellar degeneration and its pro-gression in SCA2. PLoS ONE, 13(7):e0200258, jul 2018.

[63] Z. Fan et al. U-net based analysis of MRI for Alzheimer’sdisease diagnosis. Neural Computing and Applications,apr 2021.

[64] B. Khagi et al. Alzheimer’s disease Classification fromBrain MRI based on transfer learning from CNN. InBMEiCON 2018 - 11th Biomedical Engineering Interna-tional Conference, 2019.

[65] J. B. Bae et al. Identification of Alzheimer’s disease us-ing a convolutional neural network model based on T1-weighted magnetic resonance imaging. Scientific Reports,10(1):22252, dec 2020.

[66] Z. Tang et al. Interpretable classification of Alzheimer’sdisease pathologies with a convolutional neural networkpipeline. Nature Communications, 10(1):2173, dec 2019.

[67] 2020 Alzheimer’s disease facts and figures. Alzheimer’sand Dementia, 16(3):391–460, mar 2020.

[68] G. Quellec et al. Automatic detection of rare pathologiesin fundus photographs using few-shot learning. MedicalImage Analysis, 61, 2020.

[69] J. I. Orlando et al. REFUGE Challenge: A unified frame-work for evaluating automated methods for glaucoma as-sessment from fundus photographs, 2020.

[70] H. Fu et al. AGE Challenge: Angle Closure GlaucomaEvaluation in Anterior Segment Optical Coherence To-mography. arXiv, 2020.

[71] H. Fu et al. ADAM: Automatic Detection challenge onAge-related Macular degeneration, 2020.

[72] P. Porwal et al. Indian Diabetic Retinopathy ImageDataset (IDRiD), 2018.

[73] P. Porwal et al. IDRiD: Diabetic Retinopathy – Segmen-tation and Grading Challenge. Medical Image Analysis,59:101561, jan 2020.

[74] P. Porwal et al. Indian diabetic retinopathy image dataset(IDRiD): A database for diabetic retinopathy screeningresearch. Data, 3(3):25, jul 2018.

[75] D. Kermany. Large Dataset of Labeled Optical CoherenceTomography (OCT) and Chest X-Ray Images. MendeleyData, 3, 2018.

[76] D. S. Kermany et al. Identifying Medical Diagnoses andTreatable Diseases by Image-Based Deep Learning. Cell,172(5):1122–1131.e9, feb 2018.

[77] H. Al Hajj et al. CATARACTS: Challenge on automatictool annotation for cataRACT surgery. Medical ImageAnalysis, 52:24–41, feb 2019.

[78] E. Flouty et al. CaDIS: Cataract dataset for image seg-mentation. arXiv, jun 2019.

[79] H. Bogunovic et al. RETOUCH: The Retinal OCT FluidDetection and Segmentation Benchmark and Challenge.IEEE Transactions on Medical Imaging, 38(8):1858–1874,aug 2019.

[80] Diabetic Retinopathy Detection. International Journal ofEngineering and Advanced Technology, 9(4):1022–1026,2020.

[81] S. J. Chiu et al. Kernel regression based segmentation ofoptical coherence tomography images with diabetic mac-ular edema. Biomedical Optics Express, 6(4):1172, apr2015.

Page 36: A Systematic Collection of Medical Image Datasets for Deep

36 Johann Li 1 et al.

[82] M. Niemeijer et al. Retinopathy online challenge: Auto-matic detection of microaneurysms in digital color fundusphotographs. IEEE Transactions on Medical Imaging,29(1):185–195, 2010.

[83] H. Gireesha and N. S. Thyroid Nodule Segmentation AndClassification In Ultrasound Images, 2015.

[84] H. J. Aerts et al. Decoding tumour phenotype by nonin-vasive imaging using a quantitative radiomics approach.Nature Communications, 5(1):4006, sep 2014.

[85] L. Wee and A. Dekker. Data from Head-Neck-Radiomics-HN1, 2019.

[86] C. E. Cardenas et al. Data from AAPM RT-MAC GrandChallenge 2019, 2019.

[87] M. Vallières et al. Radiomics strategies for risk assess-ment of tumour failure in head-and-neck cancer. Scien-tific Reports, 7(1):10117, dec 2017.

[88] M. Vallières et al. Data from Head-Neck-PET-CT, 2017.[89] P. F. Raudaschl et al. Evaluation of segmentation meth-

ods on head and neck CT: Auto-segmentation challenge2015. Medical Physics, 44(5):2020–2036, may 2017.

[90] X. Yang et al. Automated segmentation of the parotidgland based on atlas registration and machine learning: Alongitudinal mri study in head-and-neck radiation ther-apy. International Journal of Radiation Oncology BiologyPhysics, 90(5):1225–1233, dec 2014.

[91] K. Hameeteman et al. Evaluation framework for carotidbifurcation lumen segmentation and stenosis grading.Medical Image Analysis, 15(4):477–488, aug 2011.

[92] S. Zhang et al. Cognitive control of sensory pain encodingin the pregenual anterior cingulate cortex. d1 - decoderconstruction in day 1, d2 - adaptive control in day 2.,2020.

[93] Y. Ikutani et al. Decoding functional category of sourcecode from the brain (fMRI on Java program comprehen-sion), 2020.

[94] Y. Ikutani et al. Expert programmers have fine-tunedcortical representations of source code. bioRxiv, pp.2020.01.28.923953, 2020.

[95] R. VanRullen and L. Reddy. Reconstructing faces fromfMRI patterns using deep generative neural networks.Communications Biology, 2(1), oct 2019.

[96] I. Alkhasli et al. Modulation of fronto-striatal func-tional connectivity using transcranial magnetic stimula-tion. Frontiers in Human Neuroscience, 13, jun 2019.

[97] I. Alkhasli et al. Resting State - TMS, 2019.[98] G. Shen et al. Deep image reconstruction from hu-

man brain activity. PLoS Computational Biology,15(1):e1006633, jan 2019.

[99] Guohua Shen et al. Deep Image Reconstruction, 2020.[100] N. Chang et al. BOLD5000 A public fMRI dataset of

5000 images. arXiv, 6(1):49, sep 2018.[101] R. J. Lepping et al. Neural processing of emotional mu-

sical and nonmusical stimuli in depression. PLoS ONE,11(6):e0156859, jun 2016.

[102] R. J. Lepping et al. Development of a validated emotion-ally provocative musical stimulus set for research. Psy-chology of Music, 44(5):1012–1028, sep 2016.

[103] J. R. Dugré et al. Limbic hyperactivity in responseto emotionally neutral stimuli in schizophrenia: A neu-roimaging meta-analysis of the hypervigilant mind.American Journal of Psychiatry, 176(12):1021–1029, dec2019.

[104] Y. Miyawaki et al. Visual Image Reconstruction fromHuman Brain Activity using a Combination of MultiscaleLocal Image Decoders. Neuron, 60(5):915–929, dec 2008.

[105] T. Horikawa and Y. Kamitani. Generic decoding of seenand imagined objects using hierarchical visual features.Nature Communications, 8(1):15037, aug 2017.

[106] J. D. Carlin and N. Kriegeskorte. Adjudicating betweenface-coding models with individual-face fMRI responses.PLoS Computational Biology, 13(7):e1005604, jul 2017.

[107] L. Koenders et al. Grey matter changes associated withheavy cannabis use: A longitudinal sMRI study. PLoSONE, 11(5):e0152482, may 2016.

[108] Campello and M. Víctor. Multi-Centre, Multi-Vendorand Multi-Disease Cardiac Segmentation: The M&MsChallenge. IEEE Transactions on Medical Imaging, 2020.

[109] N. Heller et al. The KiTS19 challenge data: 300 Kidneytumor cases with clinical context, Ct semantic segmenta-tions, and surgical outcomes. arXiv, 2019.

[110] N. Heller et al. Data from C4KC-KiTS [Data set], 2019.[111] X. Zhuang. Multivariate Mixture Model for Myocardial

Segmentation Combining Multi-Source Images. IEEETransactions on Pattern Analysis and Machine Intelli-gence, 41(12):2933–2946, 2019.

[112] X. Zhuang. Multivariate mixture model for cardiac seg-mentation from multi-sequence MRI. In Lecture Notes inComputer Science (including subseries Lecture Notes inArtificial Intelligence and Lecture Notes in Bioinformat-ics), volume 9901 LNCS, pp. 581–588, 2016.

[113] S. Leclerc et al. Deep Learning for Segmentation Usingan Open Large-Scale Dataset in 2D Echocardiography.IEEE transactions on medical imaging, 38(9):2198–2210,sep 2019.

[114] B. Rister et al. CT-ORG: A Dataset of CT Volumes WithMultiple Organ Segmentations, 2019.

[115] N. Bloch. Nc-isbi 2013 challenge: automated segmenta-tion of prostate structures, 2015.

[116] P. Bilic1a et al. The liver tumor segmentation benchmark(LiTS). arXiv, abs/1901.0, 2019.

[117] A. E. Kavur et al. CHAOS Challenge - combined (CT-MR) healthy abdominal organ segmentation, 2021.

[118] A. E. Kavur et al. CHAOS Challenge - combined (CT-MR) healthy abdominal organ segmentation. Medical Im-age Analysis, 69, jan 2021.

[119] A. E. Kavur et al. Comparison of semi-automatic anddeep learning-based automatic methods for liver segmen-tation in living liver transplant donors. Diagnostic andInterventional Radiology, 26(1):11–21, jan 2020.

[120] R. Trullo et al. Multiorgan segmentation using distance-aware adversarial networks. Journal of Medical Imaging,6(01):1, 2019.

[121] A. L. Simpson et al. A large annotated medical imagedataset for the development and evaluation of segmenta-tion algorithms. arXiv, feb 2019.

[122] S. Jaeger et al. Automatic tuberculosis screening usingchest radiographs. IEEE Transactions on Medical Imag-ing, 33(2):233–245, 2014.

[123] S. Candemir et al. Lung segmentation in chest radio-graphs using anatomical atlases with nonrigid registra-tion. IEEE Transactions on Medical Imaging, 33(2):577–590, 2014.

[124] S. Stirenko et al. Chest X-Ray Analysis of Tuberculosis byDeep Learning with Segmentation and Augmentation. In2018 IEEE 38th International Conference on Electronicsand Nanotechnology, ELNANO 2018 - Proceedings, pp.422–428, 2018.

[125] Z. Xiong et al. A global benchmark of algorithms for seg-menting the left atrium from late gadolinium-enhancedcardiac magnetic resonance imaging. Medical ImageAnalysis, 67:101832, jan 2021.

[126] J. Yang et al. Autosegmentation for thoracic radiationtreatment planning: A grand challenge at AAPM 2017.Medical Physics, 45(10):4568–4581, 2018.

Page 37: A Systematic Collection of Medical Image Datasets for Deep

A Systematic Collection of Medical Image Datasets for Deep Learning 37

[127] J. Yang et al. Data from Lung CT Segmentation Chal-lenge, 2017.

[128] H. R. Roth et al. Deeporgan: Multi-level deep convolu-tional networks for automated pancreas segmentation. InLecture Notes in Computer Science (including subseriesLecture Notes in Artificial Intelligence and Lecture Notesin Bioinformatics), volume 9349, pp. 556–564. 2015.

[129] H. Roth et al. Data From Pancreas-CT, 2016.[130] D. Newitt and N. Hylton. Single site breast DCE-MRI

data and segmentations from patients undergoing neoad-juvant chemotherapy, 2016.

[131] D. F. Pace et al. Interactive whole-heart segmentationin congenital heart disease. In Lecture Notes in Com-puter Science (including subseries Lecture Notes in Arti-ficial Intelligence and Lecture Notes in Bioinformatics),volume 9351, pp. 80–88, 2015.

[132] B. N. Bloch et al. Data From PROSTATE-DIAGNOSIS,2015.

[133] O. Jimenez-Del-Toro et al. Cloud-Based Evaluation ofAnatomical Structure Segmentation and Landmark De-tection Algorithms: VISCERAL Anatomy Benchmarks.IEEE Transactions on Medical Imaging, 35(11):2459–2475, nov 2016.

[134] A. Seff et al. Leveraging mid-level semantic bound-ary cues for automated lymph node detection. In Lec-ture Notes in Computer Science (including subseries Lec-ture Notes in Artificial Intelligence and Lecture Notes inBioinformatics), 2015.

[135] A. Seff et al. 2D view aggregation for lymph node detec-tion using a shallow hierarchy of linear classifiers. In Lec-ture Notes in Computer Science (including subseries Lec-ture Notes in Artificial Intelligence and Lecture Notes inBioinformatics), volume 8673 LNCS, pp. 544–552, 2014.

[136] H. R. Roth et al. A new 2.5D representation for lymphnode detection using random sets of deep convolutionalneural network observations. In Lecture Notes in Com-puter Science (including subseries Lecture Notes in Arti-ficial Intelligence and Lecture Notes in Bioinformatics),volume 8673 LNCS, pp. 520–527, 2014.

[137] R. Holger et al. A new 2.5 D representation for lymphnode detection in CT., 2015.

[138] A. B. Spanier and L. Joskowicz. Rule-based ventral cav-ity multi-organ automatic segmentation in CT scans. InLecture Notes in Computer Science (including subseriesLecture Notes in Artificial Intelligence and Lecture Notesin Bioinformatics), volume 8848, pp. 163–170, 2014.

[139] C. Tobon-Gomez et al. Benchmark for Algorithms Seg-menting the Left Atrium From 3D CT and MRI Datasets.IEEE Transactions on Medical Imaging, 34(7):1460–1473,2015.

[140] G. Litjens et al. Evaluation of prostate segmentation al-gorithms for MRI: The PROMISE12 challenge. MedicalImage Analysis, 2014.

[141] J. H. H. Litjens Geert; Futterer. Data From Prostate-3T,2015.

[142] S. Balocco et al. Standardized evaluation methodologyand reference database for evaluating IVUS image seg-mentation. Computerized Medical Imaging and Graphics,38(2):70–90, mar 2014.

[143] P. Lo et al. Extraction of airways from CT (EXACT’09).IEEE Transactions on Medical Imaging, 31(11):2093–2107, 2012.

[144] T. Heimann et al. Comparison and evaluation of methodsfor liver segmentation from CT datasets. IEEE Transac-tions on Medical Imaging, 28(8):1251–1265, 2009.

[145] J. Zhao et al. COVID-CT-Dataset: A CT image datasetabout COVID-19. arXiv, mar 2020.

[146] L. L. Wang et al. CORD-19: The COVID-19 open re-search dataset. arXiv, apr 2020.

[147] G. Maguolo and L. Nanni. A Critic Evaluation of Meth-ods for COVID-19 Automatic Detection from X-Ray Im-ages. arXiv, apr 2020.

[148] E. Tartaglione et al. Unveiling COVID-19 from chestx-ray with deep learning: A hurdles race with smalldata. International Journal of Environmental Researchand Public Health, 17(18):1–17, apr 2020.

[149] J. Born et al. Accelerating COVID-19 differential diag-nosis with explainable ultrasound image analysis. arXiv,sep 2020.

[150] J. Born et al. POCOVID-net: Automatic detection ofCOVID-19 from a new lung ultrasound imaging dataset(POCUS). arXiv, apr 2020.

[151] L. Wang et al. COVID-Net: a tailored deep convolutionalneural network design for detection of COVID-19 casesfrom chest X-ray images. Scientific Reports, 10(1):19549,dec 2020.

[152] M. E. Chowdhury et al. Can AI Help in Screening Vi-ral and COVID-19 Pneumonia? IEEE Access, 8:132665–132676, 2020.

[153] S. A. Harmon et al. Artificial intelligence for the detectionof COVID-19 pneumonia on chest CT using multinationaldatasets. Nature Communications, 11(1):4080, dec 2020.

[154] P. An et al. CT Images in Covid-19 [Data set], 2020.[155] S. Desai et al. Chest imaging representing a COVID-19

positive rural U.S. population, 2020.[156] S. Desai et al. Chest imaging representing a COVID-19

positive rural U.S. population. Scientific Data, 7(1):414,dec 2020.

[157] M. Buda et al. Detection of masses and architectural dis-tortions in digital breast tomosynthesis: a publicly avail-able dataset of 5,060 patients and a deep learning model.2020.

[158] M. Buda et al. Breast Cancer Screening – Digital BreastTomosynthesis (BCS-DBT), 2020.

[159] D. Li, P., Wang, S., Li, T., Lu, J., HuangFu, Y., & Wang.A Large-Scale CT and PET/CT Dataset for Lung CancerDiagnosis [Data set], 2020.

[160] J. Irvin et al. CheXpert: A large chest radiographdataset with uncertainty labels and expert comparison.33rd AAAI Conference on Artificial Intelligence, AAAI2019, 31st Innovative Applications of Artificial Intelli-gence Conference, IAAI 2019 and the 9th AAAI Sympo-sium on Educational Advances in Artificial Intelligence,EAAI 2019, 33:590–597, jan 2019.

[161] L. Wee et al. Data from NSCLC-Radiomics-Interobserver1, 2019.

[162] A. E. Johnson et al. MIMIC-CXR, a de-identified pub-licly available database of chest radiographs with free-textreports. Scientific Data, 6(1), 2019.

[163] A. E. W. Johnson et al. The MIMIC-CXR Database,2019.

[164] M. Rusu et al. Co-registration of pre-operative CT withex vivo surgically excised ground glass nodules to de-fine spatial extent of invasive adenocarcinoma on in vivoimaging: a proof-of-concept study. European Radiology,27(10):4209–4217, 2017.

[165] A. Madabhushi and M. Rusu. Fused Radiology-PathologyLung Dataset, 2018.

[166] O. Bernard et al. Deep Learning Techniques for Auto-matic MRI Cardiac Multi-Structures Segmentation andDiagnosis: Is the Problem Solved? IEEE Transactions onMedical Imaging, 37(11):2514–2525, nov 2018.

Page 38: A Systematic Collection of Medical Image Datasets for Deep

38 Johann Li 1 et al.

[167] O. Gevaert et al. Non-small cell lung cancer: Identifyingprognostic imaging biomarkers by leveraging public geneexpression microarray data - Methods and preliminaryresults. Radiology, 264(2):387–396, aug 2012.

[168] L. Kostakoglu et al. A phase II study of 3-Deoxy-3-18F-fluorothymidine PET in the assessment of early re-sponse of breast cancer to neoadjuvant chemotherapy:Results from ACRIN 6688. Journal of Nuclear Medicine,56(11):1681–1689, nov 2015.

[169] P. Kinahan et al. Data from ACRIN-FLT-Breast, 2017.[170] X. Wang et al. ChestX-ray8: Hospital-scale chest X-ray

database and benchmarks on weakly-supervised classifi-cation and localization of common thorax diseases. InProceedings - 30th IEEE Conference on Computer Vi-sion and Pattern Recognition, CVPR 2017, volume 2017-Janua, pp. 3462–3471, 2017.

[171] G. Litjens et al. Computer-aided detection of prostatecancer in MRI. IEEE Transactions on Medical Imaging,33(5):1083–1092, may 2014.

[172] G. Litjens et al. ProstateX challenge data, 2017.[173] A. A. A. Setio et al. Validation, comparison, and com-

bination of algorithms for automatic detection of pul-monary nodules in computed tomography images: TheLUNA16 challenge. Medical Image Analysis, 42:1–13, dec2017.

[174] M. Kachelrieß et al. Flying focal spot (FFS) in cone-beamCT. IEEE Transactions on Nuclear Science, 53(3):1238–1247, 2006.

[175] T. G. Flohr et al. Image reconstruction and image qualityevaluation for a 64-slice CT scanner with z-flying focalspot. Medical Physics, 32(8):2536–2547, 2005.

[176] B. Zhao et al. Evaluating variability in tumor measure-ments from same-day repeat CT scans of patients withnon-small cell lung cancer. Radiology, 252(1):263–272, jul2009.

[177] B. Zhao et al. Data From RIDER_Lung CT, 2015.[178] M. A. Gavrielides et al. Data From Phantom_FDA, 2015.[179] M. A. Gavrielides et al. A resource for the assessment

of lung nodule size estimation methods: database of tho-racic CT scans of an anthropomorphic phantom. OpticsExpress, 18(14):15244, jul 2010.

[180] B. N. Bloch et al. Data From BREAST-DIAGNOSIS,2015.

[181] M. Vallières et al. A radiomics model from joint FDG-PET and MRI texture features for the prediction oflung metastases in soft-tissue sarcomas of the extremi-ties. Physics in Medicine and Biology, 60(14):5471–5496,jul 2015.

[182] S. G. Armato et al. LUNGx Challenge for computerizedlung nodule classification. Journal of Medical Imaging,3(4):044506, dec 2016.

[183] S. G. Armato et al. Guest Editorial: LUNGx Chal-lenge for computerized lung nodule classification: reflec-tions and lessons learned. Journal of Medical Imaging,2(2):020103, jun 2015.

[184] R. Armato III, S. G., Hadjiiski, L., Tourassi, G.D.,Drukker, K., Giger, M.L., Li, F. and L. G., Farahani, K.,Kirby, J.S. and Clarke. SPIE-AAPM-NCI Lung NoduleClassification Challenge, 2015.

[185] S. K. Napel, Sandy, & Plevritis. NSCLC Radiogenomics:Initial Stanford Study of 26 Cases. The Cancer ImagingArchive., 2014.

[186] S. G. Armato et al. The Lung Image Database Con-sortium (LIDC) and Image Database Resource Initiative(IDRI): A completed reference database of lung noduleson CT scans. Medical Physics, 38(2):915–931, jan 2011.

[187] A. III et al. Data From LIDC-IDRI, 2015.

[188] C. K. Smith K et al. Data FromCT_COLONOGRAPHY, 2015.

[189] C. D. Johnson et al. Accuracy of CT Colonography forDetection of Large Adenomas and Cancers. New EnglandJournal of Medicine, 359(12):1207–1217, sep 2008.

[190] B. van Ginneken et al. Comparing and combining algo-rithms for computer-aided detection of pulmonary nod-ules in computed tomography scans: The ANODE09study. Medical Image Analysis, 14(6):707–722, 2010.

[191] S. Ali et al. An objective comparison of detection and seg-mentation algorithms for artefacts in clinical endoscopy.Scientific Reports, 10(1):2748, dec 2020.

[192] S. Ali et al. Deep learning for detection and segmenta-tion of artefact and disease instances in gastrointestinalendoscopy. Medical Image Analysis, 70, 2021.

[193] V. S. Bawa et al. ESAD: Endoscopic surgeon action de-tection dataset. arXiv, jun 2020.

[194] S. Ali et al. Endoscopy artifact detection (EAD 2019)challenge dataset. arXiv, may 2019.

[195] T. L. A. van den Heuvel et al. Automated measurementof fetal head circumference using 2D ultrasound images.PLOS ONE, 13(8):e0200412, aug 2018.

[196] R. Karim et al. Algorithms for left atrial wall segmen-tation and thickness – Evaluation on an open-source CTand MRI image database. Medical Image Analysis, 50:36–53, dec 2018.

[197] H. A. Kirişli et al. Standardized evaluation frameworkfor evaluating coronary artery stenosis detection, steno-sis quantification and lumen segmentation algorithms incomputed tomography angiography. Medical Image Anal-ysis, 17(8):859–876, dec 2013.

[198] S. Rueda et al. Evaluation and comparison of current fe-tal ultrasound image segmentation methods for biometricmeasurements: A grand challenge. IEEE Transactions onMedical Imaging, 33(4):797–813, 2014.

[199] X. He et al. PATHVQA: 30000+ questions for medicalvisual question answering. arXiv, mar 2020.

[200] R. Verma et al. Multi-organ Nuclei Segmentation andClassification Challenge 2020. (February):1–3, 2020.

[201] M. Peikari et al. Automatic cellularity assessment frompost-treated breast surgical specimens. Cytometry PartA, 91(11):1078–1087, 2017.

[202] A. L. Martel et al. Assessment of Residual Breast CancerCellularity after Neoadjuvant Chemotherapy using Digi-tal Pathology [Data set], 2019.

[203] S. A. R. D. D. O. A. H. B. &. M. R. Leavey P. Osteosar-coma data from UT Southwestern/UT Dallas for Viableand Necrotic Tumor Assessment, 2019.

[204] R. Mishra et al. Histopathological diagnosis for viableand non-viable tumor prediction for osteosarcoma usingconvolutional neural network. In Lecture Notes in Com-puter Science (including subseries Lecture Notes in Arti-ficial Intelligence and Lecture Notes in Bioinformatics),volume 10330 LNBI, pp. 12–23, 2017.

[205] L. P. et al. American Society of Pediatric Hematol-ogy/Oncology (ASPHO) Palais des congrés de MontréalMontréal, Canada April 26-29, 2017. Pediatric blood &cancer, 64, 2017.

[206] H. B. Arunachalam et al. Computer aided image segmen-tation and classification for viable and non-viable tumoridentification in osteosarcoma. In Pacific Symposium onBiocomputing, volume 0, pp. 195–206, 2017.

[207] R. Mishra et al. Convolutional neural network forhistopathological analysis of osteosarcoma. In Journalof Computational Biology, volume 25, pp. 313–325, 2018.

Page 39: A Systematic Collection of Medical Image Datasets for Deep

A Systematic Collection of Medical Image Datasets for Deep Learning 39

[208] J. Li et al. Signet Ring Cell Detection with a Semi-supervised Learning Framework. Lecture Notes in Com-puter Science (including subseries Lecture Notes in Arti-ficial Intelligence and Lecture Notes in Bioinformatics),11492 LNCS:842–854, jul 2019.

[209] Z. Swiderska-Chadaj et al. Learning to detect lympho-cytes in immunohistochemistry with deep learning. Med-ical Image Analysis, 58, 2019.

[210] J. Borovec et al. ANHIR: Automatic Non-Rigid Histo-logical Image Registration Challenge. IEEE transactionson medical imaging, 39(10):3042–3052, 2020.

[211] J. Borovec. Birl: Benchmark on Image Registration Meth-ods With Landmark Validation. arXiv, dec 2019.

[212] G. Campanella et al. Clinical-grade computationalpathology using weakly supervised deep learning onwhole slide images. Nature Medicine, 25(8):1301–1309,2019.

[213] G. Campanella et al. Breast Metastases to AxillaryLymph Nodes, 2019.

[214] Z. Li et al. Title: Computer-aided diagnosis of lung car-cinoma using deep learning – a pilot study. arXiv, mar2018.

[215] N. Kumar et al. A Multi-Organ Nucleus Segmenta-tion Challenge. IEEE Transactions on Medical Imaging,39(5):1380–1391, may 2020.

[216] B. S. Veeling et al. Rotation equivariant CNNs for digitalpathology. Lecture Notes in Computer Science (includ-ing subseries Lecture Notes in Artificial Intelligence andLecture Notes in Bioinformatics), 11071 LNCS:210–218,jun 2018.

[217] B. E. Bejnordi et al. Diagnostic assessment of deep learn-ing algorithms for detection of lymph node metastases inwomen with breast cancer. JAMA - Journal of the Amer-ican Medical Association, 318(22):2199–2210, dec 2017.

[218] G. Aresta et al. BACH: Grand challenge on breast cancerhistology images. Medical Image Analysis, 56:122–139,aug 2019.

[219] G. Litjens et al. 1399 H&E-stained sentinel lymph nodesections of breast cancer patients: The CAMELYONdataset. GigaScience, 7(6), jun 2018.

[220] P. Bándi et al. From Detection of Individual Metastasesto Classification of Lymph Node Status at the PatientLevel: The CAMELYON17 Challenge. IEEE Transac-tions on Medical Imaging, 38(2):550–560, feb 2019.

[221] V. Ulman et al. An objective comparison of cell-trackingalgorithms. Nature Methods, 14(12):1141–1152, dec 2017.

[222] D. Wei et al. MitoEM Dataset: Large-Scale 3D Mito-chondria Instance Segmentation from EM Images. InLecture Notes in Computer Science (including subseriesLecture Notes in Artificial Intelligence and Lecture Notesin Bioinformatics), volume 12265 LNCS, pp. 66–76, 2020.

[223] C. Matek et al. Human-level recognition of blast cells inacute myeloid leukaemia with convolutional neural net-works. Nature Machine Intelligence, 1(11):538–544, nov2019.

[224] C. Matek et al. A Single-cell Morphological Dataset ofLeukocytes from AML Patients and Non-malignant Con-trols [Data set], 2019.

[225] Y. Pan et al. Neighborhood-Correction Algorithm forCells. Lecture Notes in Bioengineering. Springer Singa-pore, Singapore, 2019.

[226] S. Gehlot et al. SDCT-AuxNetθ: DCT augmented staindeconvolutional CNN with auxiliary classifier for cancerdiagnosis. Medical Image Analysis, 61:101661, apr 2020.

[227] S. Goswami et al. Heterogeneity loss to handle intersub-ject and intrasubject variability in cancer. arXiv, mar2020.

[228] A. Gupta et al. PCSEG: Color model driven probabilisticmultiphase level set based tool for plasma cell segmenta-tion in multiple myeloma. PLoS ONE, 13(12), 2018.

[229] R. Gupta et al. Stain Color Normalization and Segmen-tation of Plasma Cells in Microscopic Images as a Pre-lude to Development of Computer Assisted AutomatedDisease Diagnostic Tool in Multiple Myeloma. ClinicalLymphoma Myeloma and Leukemia, 17(1):e99, feb 2017.

[230] R. Gupta and A. Gupta. MiMM_SBILab Dataset: Mi-croscopic Images of Multiple Myeloma, 2019.

[231] R. Duggal et al. SD-Layer: Stain deconvolutional layerfor CNNs in medical microscopic imaging. Lecture Notesin Computer Science (including subseries Lecture Notesin Artificial Intelligence and Lecture Notes in Bioinfor-matics), 10435 LNCS:435–443, 2017.

[232] R. Duggal et al. Overlapping cell nuclei segmentation inmicroscopic images using deep belief networks. In ACMInternational Conference Proceeding Series, 2016.

[233] Natasha Honomichl. SN-AM Dataset: White Blood can-cer dataset of B-ALL and MM for stain normalization,2019.

[234] L. Jin et al. Deep-learning-assisted detection and seg-mentation of rib fractures from CT scans: Developmentand validation of FracNet. EBioMedicine, 62:103106, dec2020.

[235] M. T. Löffler et al. A Vertebral Segmentation Datasetwith Fracture Grading. Radiology: Artificial Intelligence,2(4):e190138, 2020.

[236] A. Sekuboyina et al. VerSe: A Vertebrae Labelling andSegmentation Benchmark for Multi-detector CT Images.arXiv, jan 2020.

[237] A. A. Yorke et al. Pelvic Reference Data, 2019.[238] N. Bien et al. Deep-learning-assisted diagnosis for

knee magnetic resonance imaging: Development andretrospective validation of MRNet. PLoS Medicine,15(11):e1002699, nov 2018.

[239] P. Rajpurkar et al. MURA: Large dataset for abnormal-ity detection in musculoskeletal radiographs. arXiv, dec2017.

[240] Y. Song et al. Bone texture characterization with fisherencoding of local descriptors. In Proceedings - Interna-tional Symposium on Biomedical Imaging, volume 2015-July, pp. 5–8. IEEE, apr 2015.

[241] A. Sekuboyina et al. Labeling Vertebrae with Two-dimensional Reformations of Multidetector CT Images:An Adversarial Approach for Incorporating Prior Knowl-edge of Spine Anatomy. Radiology: Artificial Intelligence,2(2):e190074, 2020.

[242] B. Cassidy et al. DFUC 2020: Analysis towards diabeticfoot ulcer detection. arXiv, apr 2020.

[243] International Skin Imaging Collaboration. SIIM-ISIC2020 Challenge Dataset, 2020.

[244] P. Tschandl et al. Data descriptor: The HAM10000dataset, a large collection of multi-source dermatoscopicimages of common pigmented skin lesions. ScientificData, 5(1):180161, dec 2018.

[245] N. Codella et al. Skin lesion analysis toward melanomadetection 2018: A challenge hosted by the internationalskin imaging collaboration (ISIC). arXiv, feb 2019.

[246] N. C. Codella et al. Skin lesion analysis toward melanomadetection: A challenge at the 2017 International sympo-sium on biomedical imaging (ISBI), hosted by the inter-national skin imaging collaboration (ISIC). Proceedings- International Symposium on Biomedical Imaging, 2018-April:168–172, oct 2018.

Page 40: A Systematic Collection of Medical Image Datasets for Deep

40 Johann Li 1 et al.

[247] R. B. Ger et al. Synthetic head and neck and phantomimages for determining deformable image registration ac-curacy in magnetic resonance imaging. Medical Physics,45(9):4315–4321, sep 2018.

[248] R. Ger et al. Data from Synthetic and Phantom MRImages for Determining Deformable Image RegistrationAccuracy (MRI-DIR), 2018.

[249] R. Ger et al. Data from CT Phantom Scans for Head,Chest, and Controlled Protocols on 100 Scanners (CC-Radiomics-Phantom-3), 2019.

[250] R. B. Ger et al. Comprehensive Investigation on Control-ling for CT Imaging Variabilities in Radiomics Studies.Scientific Reports, 8(1):13047, dec 2018.

[251] S. u. H. M et al. Credence Cartridge Radiomics Phan-tom CT Scans with Controlled Scanning Approach (CC-Radiomics-Phantom-2), 2018.

[252] D. Mackin et al. Data From Credence Cartridge Ra-diomics Phantom CT Scans, 2017.

[253] P. Muzi et al. Data From RIDER_PHANTOM_PET-CT, 2015.

[254] N. Takata et al. Mouse_awake_rest, 2020.[255] N. A. Nadkarni et al. MouseLemurAtlas_MRIraw, 2019.[256] N. A. Nadkarni et al. A 3D population-based brain atlas

of the mouse lemur primate with examples of applicationsin aging studies and comparative anatomy. NeuroImage,185:85–95, jan 2019.

[257] N. C. I. C. P. T. A. Consortium. Radiology Data from theClinical Proteomic Tumor Analysis Consortium Glioblas-toma Multiforme [CPTAC-GBM] collection, 2018.

[258] G. Muñoz-Gil et al. AnDi: The anomalous diffusion chal-lenge, 2020.

[259] A. Dalca et al. Learn2Reg - The Challenge, 2020.[260] L. Zhang et al. MeDaS: An open-source platform as ser-

vice to help break the walls between medicine and infor-matics. arXiv, jul 2020.

[261] A. Demircioğlu et al. Detecting the pulmonary trunk inCT scout views using deep learning. Scientific Reports,11(1):10215, dec 2021.

[262] J. K. Kim et al. Prediction of ambulatory outcome in pa-tients with corona radiata infarction using deep learning.Scientific Reports, 11(1):7989, dec 2021.

[263] J. Yim et al. Predicting conversion to wet age-relatedmacular degeneration using deep learning. NatureMedicine, 26(6):892–899, jun 2020.

[264] L. Zhang et al. Block Level Skip Connections across Cas-caded V-Net for Multi-Organ Segmentation. IEEE Trans-actions on Medical Imaging, 39(9):2782–2793, 2020.

[265] J. Deng et al. ImageNet: A large-scale hierarchical imagedatabase. In 2009 IEEE Conference on Computer Visionand Pattern Recognition, pp. 248–255. IEEE, jun 2010.

[266] Y. LeCun et al. Gradient-based learning applied to docu-ment recognition. Proceedings of the IEEE, 86(11):2278–2323, 1998.

[267] T. Y. Lin et al. Microsoft COCO: Common objects incontext. Lecture Notes in Computer Science (includingsubseries Lecture Notes in Artificial Intelligence and Lec-ture Notes in Bioinformatics), 8693 LNCS(PART 5):740–755, may 2014.

[268] P. Li et al. A Dataset of Pulmonary Lesions WithMultiple-Level Attributes and Fine Contours. Frontiersin Digital Health, 2, feb 2021.

[269] A. K. Mondal et al. Few-shot 3D multi-modal medicalimage segmentation using generative adversarial learning.arXiv, oct 2018.

[270] M. Rezaei and M. Shahidi. Zero-Shot Learning and ItsApplications From Autonomous Vehicles to Covid-19 Di-agnosis: A Review. arXiv, apr 2020.

[271] M. J. Sheller et al. Multi-institutional deep learning mod-eling without sharing patient data: A feasibility study onbrain tumor segmentation. In Lecture Notes in Com-puter Science (including subseries Lecture Notes in Arti-ficial Intelligence and Lecture Notes in Bioinformatics),volume 11383 LNCS, pp. 92–104. 2019.

[272] H. Liang et al. Evaluation and accurate diagnoses ofpediatric diseases using artificial intelligence. NatureMedicine, 25(3):433–438, mar 2019.

[273] N. Viani et al. A natural language processing approachfor identifying temporal disease onset information frommental healthcare text. Scientific Reports, 11(1):757, dec2021.

[274] Y. Kim et al. Validation of deep learning natural lan-guage processing algorithm for keyword extraction frompathology reports in electronic health records. ScientificReports, 10(1):20265, dec 2020.

[275] W. Shao et al. Deep active learning for nucleus classifica-tion in pathology images. In Proceedings - InternationalSymposium on Biomedical Imaging, volume 2018-April,pp. 199–202. IEEE, apr 2018.

[276] J. Carse and S. McKenna. Active Learning for Patch-Based Digital Pathology Using Convolutional Neural Net-works to Reduce Annotation Costs. In Lecture Notes inComputer Science (including subseries Lecture Notes inArtificial Intelligence and Lecture Notes in Bioinformat-ics), volume 11435 LNCS, pp. 20–27. 2019.