scaling up image annotation for deep learning: … · scaling up image annotation for deep...

SCALING UP IMAGE ANNOTATION FORDEEP LEARNING: STANDARDS, LABELSFROM TEXT, AND LEVERAGING MULTI-

INSTITUTIONAL DATA

Daniel L. Rubin, MD, MS

Professor of Biomedical Data Science, Radiology, Medicine (Biomedical Informatics), and

Ophthalmology (by courtesy)Stanford University

AcknowledgementsStudents, Post-docs, Residents, Staff, and Collaborators

– Bao Do

– Selen Bozkurt

– Assaf Hoogi

Funding Support– NCI QIN grants

U01CA142555,1U01CA190214, 1U01CA187947

– Stanford-AstraZeneca Collaboration Grant– NVIDIA Academic Hardware Grant Program– Stanford Philips and GE BlueSky

– Alfiia Galimzianova

– Imon Banerjee

– Christopher Re

– Sandy Napel

– Chris Beaulieu– Darvin Yi

– Xuerong Xiao

– Carson Lam

– Blaine Rister

– Hersh Sagreiya

– Emel Alkim

– Ann Leung

– Matthew Lungren

– Jared Dunnmon

– David Conn

– Mete Akdogan

– Niranjan Balachandar

– Curt Langlotz

– Ted Leng

– Joelle Hallak

– Luis de Sisternes

– Zaid Nabulsi

– Michael Gensheimer

Challenges to scaling up image annotation for deep learning Varying data/file formats for saving image

annotations Difficulty leveraging free text radiology

reports as a source for labels for images Hurdles to sharing data across institutions

to build more robust AI models

Detection,Segmentation

Classification,Diagnosis

Image annotations are crucial for AI

ROI1

ROI2

Varying file formats for image annotations Regions of interest

(ROIs) and image labels◦ DICOM-PS◦ Burned-in image◦ Proprietary formats

Clinical labels (diagnoses, findings, patient outcomes◦ EMR◦ Spreadsheets◦ Delimited files◦ Proprietary formats

Vendor 4

Lack of image annotation standards thwarts interoperability

Vendor 1 Vendor 3

Vendor 2 3D Slicer

Copyright © Daniel Rubin 2015

Annotation and Image Markup (AIM) XML schema to make the information that

humans and machines see in images machine-accessible in standard format

Enables interoperability of this information across systems and computer applications

Developed by National Cancer Imaging Program at NCI

Harmonized/incorporated into DICOM-SRRubin DL, et. al: Medical Imaging on the Semantic Web: Annotation and Image Markup, AAAI 2008.https://wiki.nci.nih.gov/display/AIM/Annotation+and+Image+Markup+-+AIM


AIM captures annotations in XML


QUALITATIVE

QUANTITATIVE

Anatomic Entity: Upper lobe of left lung (RID1327)Observation: Mass (RID3874)

Characteristic: Microlobulated margin (RID5712)Geometric Shape: Polyline

2D coordinates: {(x,y), (x,y)….}Calculation: Largest diameter result: 2.8 cmDiagnosis: Lung cancer

DICOM SR (TID 1500)

XML

HL7 CDA/FHIR

AIM annotations interoperate with other standards


https://github.com/NCIP/annotation-and-image-markup/tree/master/AIMToolkit_v3.0.2_rv11/examples/ANIVATR

eLectronic Physician Annotation Device ePAD: free, open source Web-based image viewer and annotator AIM-compliant annotation; supports AIM templates Plugins for quantifying lesion features

Template

ROI

Values

Rubin, Willrett, O'Connor, Hage, Kurtz, Moreira, Translational Oncology 7(1):23-35, 2014http://epad.stanford.edu

Quantitative image features

Annotations linked to images

Qualitative image features

AIM being used for public sharing of image annotations The Cancer Genome Atlas (TCGA) imaging

projects◦ Brain cancer◦ Breast cancer◦ Bladder Cancer

The Cancer Imaging Archive (TCIA) Quantitative Imaging Network (QIN) of NCI

Copyright © Stanford University 2018

Motivating challenges for needing to use free text reports• Scarcity of annotated images -

need millions of images to train a complex neural network

• Annotation is a laborious, time consuming and expensive

• Radiology reports are associated with routine clinical images that could be leveraged

Radiological image annotation: leveraging clinical notes• PACS contains millions of images “labeled” in the form of

unstructured notes.• Why not to use the notes for annotating the images?

• Unstructured free text cannot be directly interpreted by a machine due to the ambiguity and subtlety of natural language.

• How to extract the semantic information from the clinical notes?

Radiologist’s noteCT image


Word embeddings to identify annotation labels from narrative text

Unsupervised deep learning algorithms (e.g., word2vec) can learn a feature representation from texts without the need of supplying specific domain knowledge

Word embedding using deep learning (4,442 words) projected in two dimensions

Imon Banerjee, JDI 30:506-518, 2017

Ontocrawler: Generating domain dictionaries for annotation tasks Created an ontology crawler using SPARQL that

grabs the sub-classes and synonyms of the domain-specific terms from NCBO bio-portal.

Generate a focused dictionary for each domain of radiology.

• {‘apoplexy’, ‘contusion’, ‘hematoma’, ...} ‘hemorrhage’


Intelligent word embedding pipeline


Word embedding + classification model Stores each word in as a point in vector space Unsupervised, built just by reading huge corpus Can be used as features to train a supervised model with a

small subset of annotations Reusable/extensible to many text extraction use cases

Word embedding

CorpusDocument embedding Classifier

Positive

Negative

Document classificationMikolov, Distributed representations of words and phrases and their compositionality

Copyright © Stanford University 2018 Imon Banerjee, In preparation

Example 1: Head CT Task: Label intracranial hemorrhage based on radiology

report Dataset: ◦ 10,000 CT reports from Stanford◦ ~900 CT reports from UPMC

Gold-standard annotation:◦ Subset of 1,188 of reports labeled independently by two

radiologists (agreement ~0.98 kappa score) Classification labels:◦ No intracranial hemorrhage◦ Diagnosis of intracranial hemorrhage unlikely, though cannot be

completely excluded◦ Diagnosis of intracranial hemorrhage possible◦ Diagnosis of intracranial hemorrhage probable, but not definitive◦ Definite intracranial hemorrhage

Copyright © Stanford University 2018Banerjee, Imon, Sriraman Madhavan, Roger Eric Goldman, and Daniel L. Rubin, AMIA Annual Symposium Proceedings, vol. 2017, p. 411. American Medical Informatics Association, 2017.

Comparative performance1. Out-of-box word2vec – without semantic

mapping2. Proposed model - with semantic mapping

21

Out-of-box word2vec Proposed model

Classifier Precision Recall F1-score Precision Recall F1-score

Random Forest 87.59% 89.17% 87.78% 88.64% 90.42% 89.08%

KNN (n = 10) 86.73% 88.90% 87.47% 88.60% 89.91% 88.88%

KNN (n = 5) 87.52% 88.65% 87.74% 88.54% 89.62% 88.76%

SVM (Radial kernel) 63.98% 79.96% 71.07% 64.19% 80.09% 71.25%

SVM (Polynomial kernel) 62.40% 78.97% 69.70% 63.25% 79.49% 70.43%


Example 2: Chest CT Task: Label pulmonary embolism based on

radiology report Dataset: ◦ 100k+ de-identified chest CT reports (Stanford and

UPMC) Baseline comparison:◦ Compare to published state-of-the-art rule-based

method for PE extraction (PeFinder) Classification labels:◦ PE acute (positive)◦ PE present (positive)◦ PE subsegmental only (negative)


Banerjee, Imon, Matthew C. Chen, Matthew P. Lungren, and Daniel L. Rubin. "Radiology report annotation using intelligent word embeddings: Applied to multi-institutional chest CT cohort." Journal of biomedical informatics 77 (2018): 11-20.

ROC curve measures

Stanford dataset UPMC dataset


Example 3: Mammography Task: Label BI-RADS final assessment

category based on findings of radiology report

Dataset: ◦ 300K mammography reports

Baseline comparison:◦ Published rule-based information extraction

method (J Biomed Inform 62:224-31, 2016) Classification labels:◦ BI-RADS Class 0 - 6


Results: Comparison with a Rule-based method

*Rule-based system: J Biomed Inform. 62:224-31, 2016


Centralized approach to AI model development

AI Model

Legal issuesIntellectual Property


P(Data|coefficients);Update parameters



Big Data aggregation without data sharing

Initiating site

Site 1

No data sharing required

Site 2

Site 3

Fit model with input parameters; return coefficientsIterate…

Courtesy Phil LavoriCopyright © Stanford University 2018

A B

C D

Centrally hosted

J Am Med Inform Assoc 25(8):945-954, 2018

Ensemble single institution

Alternative models for training distributed deep learning models

Single weight transfer Cyclical weight transfer

Centrally hosted dataN = 6000 patients

A B

Cyclical weight transfer has similar performance to centrally-hosted training

Random classification

Accuracy increases with number of collaborating institutions

Results based on having 4 institutions

J Am Med Inform Assoc 25(8):945-954, 2018

SummaryThree challenges to scaling up image annotation for deep learning◦ Varying data/file formats for saving image

annotations Image annotation standards (AIM) and tools (ePAD)

◦ Difficulty leveraging free text radiology reports as a source for labels for images Word embeddings and classification models for

information extraction◦ Hurdles to sharing data across institutions to

build more robust AI models Distributed computation of deep learning models

Thank you.

Contact info:[email protected]

scaling up image annotation for deep learning: … · scaling up image annotation for deep...

Documents