toward the future of ai-driven medicineon-demand.gputechconf.com/gtc-taiwan/2018/pdf/3-3... ·...
TRANSCRIPT
Toward the Future of
AI-Driven Medicine
葉肇元 醫師
雲象科技執行長
雲象科技
Bring state-of-the-art technology to healthcare.
Our Core
Our Mission
Our Goal
We’re a Medical Image AI company.
Empower medical imaging with A.I.
A survey of deep learning in medical image analysis
Mammographic Mass Classification
Diabetic RetinopathyDetection
Breast CancerMetastasis Detection
Brain Lesion Segmentation
Airway Segmentation of Chest CT Image
Lung Nodule Detection
Bone Suppression in X-Ray Image
Skin DiseaseClassification
Prostate
Segmentation
Organs of interest for Medical Image AI
• Brain
• Brain tumor segmentation
• Disease classification
• Survival Prediction
• Eyes (Retina)
• DM retinopathy, cardiac risk factor
• Lungs
• Lung nodule detection
• Breast
• Breast cancer screening
• Heart
• Cardiac image analysis
• Intestine
• Polyp classification
• Prostate
• Prostate segmentation
• Bones
• Age determination
• Skins
• Disease classification
• Blood Vessels
• Blood vessel segmentation
• Blood
• Blood cell counting and classification
• Authored by Google, Verily Life Sciences, and Stanford School of Medicine
• Inception-V3 Model trained on data from 236,234 patients from EyePACS , 48,101(UK Biobank), validated on data from 12,026 patients from UK Biobank, and 999 patients from EyePACS.
• Used Retinal Fundus Image to predict
• Age, gender, smoking status, BMI, systolic blood pressure, diastolic blood pressure
Poplin R, et al. Nature Biomedical Engineeringvolume 2, pages158–164(2018)
MAE : Mean Absolute ErrorFor continuous risk factors (like age), the baseline value is the Mean Absolute Error of Predicting the mean value for all patients.
The cost of making medical image AI not often talked about :
Time
Expected Timeline for a Medical Image AI Project
Required Skill Category:• Interdisciplinary Knowledge• Hospital Information System
Time(Month)
Identify Topic
Collect & Process
Data
Train & Validate ModelCollect More Data
Train & Validate Model
Deploy
2 4 6 8
• AI Software and Hardware• Healthcare Workflow
In reality..
Time(Month)
Identify Topic Collect, Process and Label Data Train & Validate Model
2 4 6 8
Houston, we’ve got a problem.
• So it takes ten months to make one AI model happen (if you’re lucky).
• But there are thousands of clinical tasks that could potentially benefit from the help of A.I. !
• (How on earth can we replace Drs. with A.I. ?)
How Do We Get There ?
Time(Month)
2 4 6 81
Identify TopicCollect Data
Train and Validate Model
Deploy
What’s holding us back? Infrastructure.
• Hospital Information System
• AI Software and Hardware
• Interdisciplinary Knowledge
• Healthcare Workflow
Interdisciplinary Knowledge
Essential Ingredient of a Successful Medical Image AI Project
• Interdisciplinary knowledge
• Intricacies of medical diagnostic procedures
• Capabilities of different neural network models
• How medical data can be digested by neural networks and turned into insight
Our first attempt at Digital Pathology AI• Lymphoma screening using whole slide image
Digging into data : examining raw input
Dark Zone
Light Zone
Follicular Lymphoma
Mantle Zone
Tinged-Body Macrophage
Web interface for deep learning inferencing
Training statistics
Lymphoma Screening Model Used on Whole Slide Image
Improved Tools for Whole Slide Image Labeling
Dataset Statistics
• Labeled Training Slides : 56 Cancer, 56 Benign
• Total number of extracted patches
• Validation: ~40,000 patches
• Testing : ~40,000 patches
Benign Cancer Background
4,460,452 147,533 87,974
Neural Network Architecture
• Modified ResNet-50:
• Dense layer after Global Average Pooling for tissue / background binary classification
• Separate path with additional dense layers for cell type (cancer / benign) classification
Neural Network Training
• Heavy data augmentation
• Flipping (Up-down, left-right), Add, Multiply, Add to Hue and Saturation, Contrast Normalization, Gaussian Blur, Gaussian Noise
• Class balancing : random sampling of equal number from each class
• Optimizer : Adam Optimizer
• Early Stopping
Training Result
Foreground / background classification
Benign / Cancer classification
Loss
Accuracy
Statistics Of Validation Result
Testing Result
Recall = SensitivityPrecision = Positive Predictive Rate
Prediction on Separate Test SlidePrediction by Neural NetworkGround Truth
Yellow : Cancer, Blue : Benign Red : Predicted cancer region
Accuracy : 90.4 %, Precision : 93.4% , Recall : 93.0 %
Class Activation Map
AI Software and Hardware
• 1 Digital slide is larger than the entire CIFAR-10 dataset
• Digital slide : 80000*60000
• CIFAR-10 : 32*32*60000
Medical Images AI Needs a Lot of Memory
• Medical images have very high spatial resolution:
• Radiography image : 5000*4000 uint16
• CT image : 512*512*300 uint16
• Digital Slide : 60000*60000*3 uint8
• Average ImageNet image : 469*387*3 uint8
GPU memory alone is not sufficientfor Medical Image AI
• For VGG-16, during training
• A GTX-1080Ti can take an image up to 1200*1200
• A Tesla P40 can take an image up to 1700*1700
• A Tesla V100 can take an image up to 2100*2100
• CUDA unified memory
CUDA Unified Memory in Tensorflow
Specialized Hardware for AI Compute
A BREAKTHROUGH IN TRAINING AND INFERENCEEach of Tesla V100's 640 Tensor Cores operates on a 4x4 matrix, and their associated data paths are custom-designed to dramatically increase floating-point compute throughput with high-energy efficiency.
This key capability enables Volta to deliver 3X performance speedups in training and inference over the previous generation.
The Power of Tensor Cores
0
2
4
6
8
10
12
14
16
GTX 1080 TI TITAN XP TITAN V
Ba
tch
es
pe
r se
con
d
Float 16 Batchsize 512
Development environment:
GTX 1080 Ti : Tensorflow 1.4, CUDA 8, cuDNN 5, nvidia-381 driver
Titan Xp : Tensorflow 1.4, CUDA 9, cuDNN 7, nvidia-387 driver
Titan V : Tensorflow 1.4, CUDA 9, cuDNN 7, nvidia-387 driver
Neural Network : Convolution * 6 + fully connected * 2 , trained on cifar-10* 2
0
2
4
6
8
10
12
14
16
GTX 1080 TI TITAN XP TITAN V
Ba
tch
es
pe
r se
con
dFloat32 Batchsize 512
GPU is often thirsty : The Importance of Pipelining
9.7
15.2
8.1
4.2
2.33 2.28
0
2
4
6
8
10
12
14
16
1 CPU 2 CPU 4 CPU 8CPU 16 CPU
Training time per epoch
Without Queue
With Queue
Without Queue
With Queue
Healthcare Information System
Problems with Existing Hospital Information System
• Databases are not tightly connected
• Limited search functions
• The majority of data exists in unstructured format (.txt, .pdf, etc)
Unified Web Interface for Medical Image AI• Web-based system that integrates:
• Clinical data
• Digital slides
• DICOM images / videos
• Deep learning annotation, training and inferencing
Annotation Interface with Structured Reporting
Annotation and Image Markup (AIM)
• An NCI initiated project that provides a solution to the following imaging challenges:
• No agreed upon syntax for annotation and markup
• No agreed upon semantics to describe annotations
• No standard format (for example, DICOM, XML, HL7) for annotations and markup
• The link between the semantics and image annotation will help make more useful and more interpretable medical image AI.
https://wiki.nci.nih.gov/display/AIM/Annotation+and+Image+Markup+-+AIM
AIM Example
Medical Record De-Identification
• Due to privacy concerns, AI research requires that personal identification information be removed from medical record.
• It’s hard to achieve satisfactory result using regular expression or other rule-based methods.
• Using tools like NeuroNER (name entity recognition), we’ve successfully achieved an F1 score of >97% on public dataset.
Next Generation IT Infrastructure for AI-Powered Hospital
Clinical Terminal• Structured Report for Both Clinical and AI use
Hybrid Storage• Fast : Cache for AI training• Slow : Data Archive
AI Training Server• High Compute Capacity• Job Queues for Non-Stop Learning
Main Server• High Availability• Advanced Database System• Job Flow Control
AI Inferencing Server•Virtualization for On-Demand AI Inferencing • Optimized for Inferencing Speed
Clinical Data Clinical Data
AI Model
AI Model
AI-Powered Diagnostic Aid
Acknowledgement
• 長庚醫院病理科莊文郁副教授
• 長庚醫院巨量資料及統計中心張尚宏主任
• 臺大醫院心臟內科王宗道教授
• 臺大醫院影像醫學科李文正醫師
• 雲象科技張哲惟
• 雲象科技游為翔
• 雲象科技楊証琨
• 雲象科技蔡岳霖